New method for determining optimal number of clusters in K-means clustering algorithm

doi:10.3778/j.issn.1002-8331.2010.16.008

Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (16): 27-31.DOI: 10.3778/j.issn.1002-8331.2010.16.008

• 博士论坛 • Previous Articles Next Articles

New method for determining optimal number of clusters in K-means clustering algorithm

ZHOU Shi-bing¹，XU Zhen-yuan^1，2，TANG Xu-qing²

1.School of Information Technology，Jiangnan University，Wuxi，Jiangsu 214122，China
2.School of Science，Jiangnan University，Wuxi，Jiangsu 214122，China

Received:2010-01-05 Revised:2010-03-26 Online:2010-06-01 Published:2010-06-01
Contact: ZHOU Shi-bing

新的K-均值算法最佳聚类数确定方法

周世兵¹，徐振源^1，2，唐旭清²

1.江南大学信息工程学院，江苏无锡 214122
2.江南大学理学院，江苏无锡 214122

通讯作者: 周世兵

Abstract

Abstract: K-means clustering algorithm clusters datasets on the premise that the number of clusters is certain and initial clustering centers are selected randomly.In general the value of k cann’t be confirmed beforehand，and randomly selected initial clustering centers make the result of clustering unstable.A new method for determining optimal number of clusters in K-means clustering algorithm is presented to analyze the clustering quality and determine optimal number of clusters through making the number of clusters produced by AP be the upper limit kmax of search range for the number of clusters，selecting the Silhouette validity index and setting initial clustering centers based on maximum and minimum distance algorithm.Simulation experiment and analysis demonstrate the feasibility of the above-mentioned algorithm.

摘要： K-均值聚类算法是以确定的类数k和随机选定的初始聚类中心为前提对数据集进行聚类的。通常聚类数k事先无法确定，随机选定的初始聚类中心容易使聚类结果不稳定。提出了一种新的确定K-均值聚类算法的最佳聚类数方法，通过设定AP算法的参数，将AP算法产生的聚类数作为聚类数搜索范围的上界kmax，并通过选择合适的有效性指标Silhouette指标，以及基于最大最小距离算法思想设定初始聚类中心，分析聚类效果，确定最佳聚类数。仿真实验和分析验证了以上算法方案的可行性。

CLC Number:

TP18

ZHOU Shi-bing¹，XU Zhen-yuan^1，2，TANG Xu-qing². New method for determining optimal number of clusters in K-means clustering algorithm[J]. Computer Engineering and Applications, 2010, 46(16): 27-31.

周世兵¹，徐振源^1，2，唐旭清². 新的K-均值算法最佳聚类数确定方法[J]. 计算机工程与应用, 2010, 46(16): 27-31.

[1]	GUI Wangsheng¹，LIU Libin¹，OUYANG Aijia²，ZHOU Yongquan³，LI Kenli⁴. Multi-objective center PSO based on Pareto for mechanical optimization [J]. Computer Engineering and Applications, 2011, 47(4): 57-60.
[2]	CUI Mingyi，ZHANG Xinxiang，SU Baiyun. Research on swarm intelligence optimization based on gene mutation [J]. Computer Engineering and Applications, 2011, 47(4): 39-41.
[3]	HUANG Minmei. Particle swarm optimization based method for logistics center location problem [J]. Computer Engineering and Applications, 2011, 47(4): 212-214.
[4]	LU Hua，ZHOU Deyun. Research on grey-analysis-based tactics decision of air-to-ground multi-target combat [J]. Computer Engineering and Applications, 2011, 47(4): 239-241.
[5]	PEI Shengyu，ZHOU Yongquan，LUO Qifang. Co-evolutionary particle swarm algorithm based on information entropy [J]. Computer Engineering and Applications, 2011, 47(3): 225-228.
[6]	SU Chen¹，NI Shihong¹，WANG Yanhong². Method of rule acquirement of flight state based on improved AIS [J]. Computer Engineering and Applications, 2011, 47(3): 237-239.
[7]	LIU Junfang¹，GAO Yuelin². Quantum particle swarm optimization algorithm with adaptive mutation [J]. Computer Engineering and Applications, 2011, 47(3): 41-43.
[8]	WANG Yongsheng，LI Junli. Centroid particle swarm optimization algorithm [J]. Computer Engineering and Applications, 2011, 47(3): 34-37.
[9]	MA Lei，WANG Xili. Semi-supervised regression based on support vector machine co-training [J]. Computer Engineering and Applications, 2011, 47(3): 177-180.
[10]	AN Qiusheng. Study of approximate measures for rough functional dependencies based on bit pattern [J]. Computer Engineering and Applications, 2011, 47(2): 26-28.
[11]	ZHANG Yunong¹，GUO Dongsheng¹，TAN Ning². Optimal-structure determination of power-activation feed-forward neural net [J]. Computer Engineering and Applications, 2011, 47(2): 29-31.
[12]	LI Chunfeng，LI Daiping，OUYANG Xiaoxing，MA Haifeng，LIU Ruiling. Research and design of chip operating system based on RF_SIM card [J]. Computer Engineering and Applications, 2011, 47(2): 57-59.
[13]	QIU Xuqin，WEI Lili . Attribute reduction of random information systems based on dominance relation [J]. Computer Engineering and Applications, 2011, 47(2): 131-135.
[14]	HUANG Junheng¹，SUN Yushan²，ZHU Dongjie². Research of clustering algorithm based on diffusion model [J]. Computer Engineering and Applications, 2011, 47(2): 121-123.
[15]	LIU Haitao^1，2，ZHAI Jingmei¹，XU Xiao¹. New method of attribute reduction algorithm for inconsistent decision table [J]. Computer Engineering and Applications, 2011, 47(2): 124-126.

New method for determining optimal number of clusters in K-means clustering algorithm

新的K-均值算法最佳聚类数确定方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics