Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (16): 27-31.DOI: 10.3778/j.issn.1002-8331.2010.16.008
• 博士论坛 • Previous Articles Next Articles
ZHOU Shi-bing1,XU Zhen-yuan1,2,TANG Xu-qing2
Received:
Revised:
Online:
Published:
Contact:
周世兵1,徐振源1,2,唐旭清2
通讯作者:
Abstract: K-means clustering algorithm clusters datasets on the premise that the number of clusters is certain and initial clustering centers are selected randomly.In general the value of k cann’t be confirmed beforehand,and randomly selected initial clustering centers make the result of clustering unstable.A new method for determining optimal number of clusters in K-means clustering algorithm is presented to analyze the clustering quality and determine optimal number of clusters through making the number of clusters produced by AP be the upper limit kmax of search range for the number of clusters,selecting the Silhouette validity index and setting initial clustering centers based on maximum and minimum distance algorithm.Simulation experiment and analysis demonstrate the feasibility of the above-mentioned algorithm.
摘要: K-均值聚类算法是以确定的类数k和随机选定的初始聚类中心为前提对数据集进行聚类的。通常聚类数k事先无法确定,随机选定的初始聚类中心容易使聚类结果不稳定。提出了一种新的确定K-均值聚类算法的最佳聚类数方法,通过设定AP算法的参数,将AP算法产生的聚类数作为聚类数搜索范围的上界kmax,并通过选择合适的有效性指标Silhouette指标,以及基于最大最小距离算法思想设定初始聚类中心,分析聚类效果,确定最佳聚类数。仿真实验和分析验证了以上算法方案的可行性。
CLC Number:
TP18
ZHOU Shi-bing1,XU Zhen-yuan1,2,TANG Xu-qing2. New method for determining optimal number of clusters in K-means clustering algorithm[J]. Computer Engineering and Applications, 2010, 46(16): 27-31.
周世兵1,徐振源1,2,唐旭清2. 新的K-均值算法最佳聚类数确定方法[J]. 计算机工程与应用, 2010, 46(16): 27-31.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2010.16.008
http://cea.ceaj.org/EN/Y2010/V46/I16/27
Attribute reduction of random information systems based on dominance relation