Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (10): 201-205.

Previous Articles     Next Articles

Improved BIRCH clustering algorithm based on density

WEI Xiang   

  1. Department of Computer Science, University of Honghe, Mengzi, Yunnan 661100, China
  • Online:2013-05-15 Published:2013-05-14

基于密度的改进BIRCH聚类算法

韦  相   

  1. 红河学院 计算机科学与技术系,云南 蒙自 661100

Abstract: The traditional BIRCH clustering algorithm has shortcoming that it is not capable enough to cluster arbitrary shapes for controlling the cluster boundary with the diameter. This paper proposes a new improved algorithm. This algorithm unifies the method of DBSCAN algorithm. And establishes many CF-trees, each tree represents one sub-cluster. The experiments show that the proposed algorithm is efficient and scalable for arbitrary shapes cluster, realizes the incremental clustering, time complexity is the same as BIRCH algorithm, and gets rid of noise.

Key words: cluster, CF-trees, density, center of mass

摘要: 针对传统的BIRCH算法用直径来控制聚类的边界,对非球形聚类效果不佳,甚至会把非球状的簇分割为不同簇这一缺点,对BIRCH算法进行改进,改进算法首先建立多棵CF树,每棵CF树代表一个簇,并结合DBSCAN算法的密度可达的思想。该算法能对任意形状的簇进行准确的聚类。实验表明,算法能通过一次扫描进行有效聚类,时间复杂度与BIRCH算法相同,对大规模数据集具有较高的处理速度,实现了动态聚类,并可以准确地对任意形状的簇进行聚类并发现噪声点。

关键词: 聚类, CF树, 密度, 质心