Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (28): 136-139.DOI: 10.3778/j.issn.1002-8331.2008.28.046

• 数据库、信号与信息处理 • Previous Articles     Next Articles

New clustering algorithm based on representatives and point density

CHEN Yuan-yuan1,CHEN Zhi-ping1,2   

  1. 1.College of Computer and Communication,Hunan University,Changsha 410082,China
    2.Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
  • Received:2007-11-20 Revised:2008-02-18 Online:2008-10-01 Published:2008-10-01
  • Contact: CHEN Yuan-yuan

一种基于代表点和点密度的聚类算法

陈园园1,陈治平1,2   

  1. 1.湖南大学 计算机与通信学院,长沙 410082
    2.清华大学 计算机科学与技术系,北京 100084
  • 通讯作者: 陈园园

Abstract: Aimed to solve the problem that the density-based clustering algorithm dose not work well when data distribution is not even,a new clustering algorithm based on representatives and point density is provided.The algorithm discovers the clusters by examining k neighbors of each point in the data base.It chooses a seed point as the first representative and the representative’s k neighbors as its represent area.If the point in the represent areas satisfies the density threshold,this point will be a new representative.And repeating searching like this,all the linked represent areas and representatives will be a cluster.Experimental results show that this algorithm can discover clusters with arbitrary shapes and densities at different levels.

Key words: data mining, clustering, point density, representative, density threshold

摘要: 针对基于密度的聚类方法不能发现密度分布不均的数据样本的缺陷,提出了一种基于代表点和点密度的聚类算法。算法通过检查数据库中每个点的k近邻来寻找聚类。首先选取一个种子点作为类的第一个代表点,其k近邻为其代表区域,如果代表区域中的点密度满足密度阈值,则将该点作为一个新的代表点,如此反复地寻找代表点,这些区域相连的代表点及其代表区域将构成一个聚类。实验结果表明,该算法能够发现任意形状、大小和密度的聚类。

关键词: 数据挖掘, 聚类, 点密度, 代表点, 密度阈值