Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (16): 27-31.DOI: 10.3778/j.issn.1002-8331.2010.16.008

• 博士论坛 • Previous Articles     Next Articles

New method for determining optimal number of clusters in K-means clustering algorithm

ZHOU Shi-bing1,XU Zhen-yuan1,2,TANG Xu-qing2   

  1. 1.School of Information Technology,Jiangnan University,Wuxi,Jiangsu 214122,China
    2.School of Science,Jiangnan University,Wuxi,Jiangsu 214122,China
  • Received:2010-01-05 Revised:2010-03-26 Online:2010-06-01 Published:2010-06-01
  • Contact: ZHOU Shi-bing

新的K-均值算法最佳聚类数确定方法

周世兵1,徐振源1,2,唐旭清2   

  1. 1.江南大学 信息工程学院,江苏 无锡 214122
    2.江南大学 理学院,江苏 无锡 214122
  • 通讯作者: 周世兵

Abstract: K-means clustering algorithm clusters datasets on the premise that the number of clusters is certain and initial clustering centers are selected randomly.In general the value of k cann’t be confirmed beforehand,and randomly selected initial clustering centers make the result of clustering unstable.A new method for determining optimal number of clusters in K-means clustering algorithm is presented to analyze the clustering quality and determine optimal number of clusters through making the number of clusters produced by AP be the upper limit kmax of search range for the number of clusters,selecting the Silhouette validity index and setting initial clustering centers based on maximum and minimum distance algorithm.Simulation experiment and analysis demonstrate the feasibility of the above-mentioned algorithm.

摘要: K-均值聚类算法是以确定的类数k和随机选定的初始聚类中心为前提对数据集进行聚类的。通常聚类数k事先无法确定,随机选定的初始聚类中心容易使聚类结果不稳定。提出了一种新的确定K-均值聚类算法的最佳聚类数方法,通过设定AP算法的参数,将AP算法产生的聚类数作为聚类数搜索范围的上界kmax,并通过选择合适的有效性指标Silhouette指标,以及基于最大最小距离算法思想设定初始聚类中心,分析聚类效果,确定最佳聚类数。仿真实验和分析验证了以上算法方案的可行性。

CLC Number: