计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (19): 147-149.

• 数据库、信号与信息处理 • 上一篇    下一篇

结合近邻和密度思想的K-均值算法的研究

王春风,唐拥政   

  1. 江苏盐城工学院 现代教育技术中心,江苏 盐城 224051
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-07-01 发布日期:2011-07-01

Research of K-means algorithm combined with neighbors and density

WANG Chunfeng,TANG Yongzheng   

  1. Modern Education Technology Center,Yancheng Institute of Technology,Yancheng,Jiangsu 224051,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-07-01 Published:2011-07-01

摘要: 为了解决K-均值算法对初始聚类中心的依赖性,提出了一种新的选取初始聚类中心的算法。采用数据区内的最高密度点作为初始中心,基于近邻点属于同一聚类的特性,找到距离初始中心最远的点,将其加入初始聚类中心后再进行计算并依次下去的方法。该改进算法的初始聚类中心分布比较合理,而且剔除了孤立点对初始聚类中心的影响,从而可以得到更好的划分效果。实验表明,用改进的算法进行聚类更能够得到较高且稳定的准确率。

关键词: 密度, 近邻, 聚类算法, K-均值, 聚类中心

Abstract: In order to solve the dependence of initial cluster center,a new K-means algorithm based on the initial cluster center has been proposed.The new algorithm selects a point having the highest density as the initial center,and based on the characteristics of neighboring points belong to the same cluster,finds the point of the furthest distance from the initial center.Next,the point is added into the initial cluster center and is calculated,then it is turned down approach.The initial cluster center distribution of the improved algorithm is more reasonable,the influence of isolated points is eliminated,and the effect of delineation is more better.The experiment shows that the improved clustering algorithm has higher and more stable accuracy.

Key words: density, neighbors, clustering algorithm, K-means, cluster center