K-means Clustering Algorithm of Optimizing Initial Clustering Center

doi:10.3778/j.issn.1002-8331.1910-0220

Abstract

Abstract:

An improved K-means clustering algorithm is proposed to solve the problem that traditional K-means algorithm is very sensitive to the initial center and the clustering result is unstable. The algorithm calculates the distance between samples, then finds the nearest two points to form a set according to the distance between samples. The algorithm finds all other nearest points to the set according to the calculation formula of points and sets until the number of data points in the set is greater or equal to [α]（[α] is the ratio of the number of data points in the sample set to the number of clusters in the cluster）, while the set is deleted from the sample set. The steps above are repeated, and K（K is the number of clusters） sets are obtained. The mean of each set is calculated as the initial center, and then the final clustering results are obtained according to K-means algorithm. In Wine, Hayes-Roth, Iris, Tae, Heart-stalog, Ionosphere and Haberman datasets, the improved algorithm designed in this study is more stable than the traditional K-means and K-means++ clustering results. In Wine, Iris and Tae datasets, the improved algorithm has higher clustering accuracy than the K-means algorithm which optimizes the initial clustering center with minimum variance, and the contour coefficients and F1 values obtained by the improved algorithm are the largest in seven sets of data. For data sets with large density differences, the improved clustering algorithm designed in this study is more stable and accurate than the traditional K-means and K-means++ algorithms, and the improved clustering algorithm is more efficient than the K-means algorithm which optimizes the initial clustering center with minimum variance.

Key words: K-means clustering algorithm, algorithm optimization, initial clustering center

摘要：

针对传统K-means算法对初始中心十分敏感，聚类结果不稳定问题，提出了一种改进K-means聚类算法。该算法首先计算样本间的距离，根据样本距离找出距离最近的两点形成集合，根据点与集合的计算公式找出其他所有离集合最近的点，直到集合内数据数目大于或等于[α]（[α]为样本集数据点数目与聚类的簇类数目的比值），再把该集合从样本集中删除，重复以上步骤得到K（K为簇类数目）个集合，计算每个集合的均值作为初始中心，并根据K-means算法得到最终的聚类结果。在Wine、Hayes-Roth、Iris、Tae、Heart-stalog、Ionosphere、Haberman数据集中，改进算法比传统K-means、K-means++算法的聚类结果更稳定；在Wine、Iris、Tae数据集中，比最小方差优化初始聚类中心的K-means算法聚类准确率更高，且在7组数据集中改进算法得到的轮廓系数和F1值最大。对于密度差异较大数据集，聚类结果比传统K-means、K-means++算法更稳定，更准确，且比最小方差优化初始聚类中心的K-means算法更高效。

关键词: K-means聚类算法, 算法优化, 初始聚类中心

GUO Yongkun, ZHANG Xinyou, LIU Liping, DING Liang, NIU Xiaolu. K-means Clustering Algorithm of Optimizing Initial Clustering Center[J]. Computer Engineering and Applications, 2020, 56(15): 172-178.

郭永坤，章新友，刘莉萍，丁亮，牛晓录. 优化初始聚类中心的K-means聚类算法[J]. 计算机工程与应用, 2020, 56(15): 172-178.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	ZHANG Ziran, HUANG Weihua, CHEN Yang, ZHANG Zheng, LI Ziyuan. Improved Ant Colony Path Planning Algorithm Based on Bidirectional Search [J]. Computer Engineering and Applications, 2021, 57(21): 270-277.
[3]	CUI Fangyi1, JING Xiaoyuan2, DONG Xiwei2，3, WU Fei2, SUN Ying2. Fuzzy Clustering Based on Adaptive Bat Algorithm Optimization and Its Application [J]. Computer Engineering and Applications, 2019, 55(7): 16-22.
[4]	DONG Benzhi, NIE Lili, JING Weipeng, CUI Hang. Identification method of ambrostoma quadriimpressum motschlsky based on Faster R-CNN [J]. Computer Engineering and Applications, 2018, 54(23): 89-93.
[5]	LI Fan1, GAO Dong1, XU Xin2, ZHANG Yuliang2. Research of improved bat algorithm for flexible job-shop scheduling problem [J]. Computer Engineering and Applications, 2018, 54(21): 265-270.
[6]	WANG Shi1，2，TU Weiping1，2. Algorithm analysis and fixed-point implementation of AVS-P10 audio decoder [J]. Computer Engineering and Applications, 2016, 52(18): 92-97.
[7]	DONG Xiaofen1，2, ZHANG Wei1, PANG Mingyong1，2. Analyzing efficiency of space-subdivision-based searching data structures [J]. Computer Engineering and Applications, 2016, 52(15): 73-78.
[8]	HE Yunbin1, LIU Xuejiao1, WANG Zhiqiang2, WAN Jing1, LI Song1. Improved K-means algorithm based on global center and nonuniqueness high-density points [J]. Computer Engineering and Applications, 2016, 52(1): 48-54.
[9]	XU Junwei, XU Weihong. Hybrid clustering algorithm based on disturbance immune particle swarm optimization and K-means [J]. Computer Engineering and Applications, 2014, 50(22): 163-169.
[10]	ZHU Ming, LU Xiaofeng, LU Hengli, LI Yingjiao, ZHANG Dongsheng. Transplantation and optimization of AdaBoost face detection algorithm on DSP [J]. Computer Engineering and Applications, 2014, 50(20): 197-201.
[11]	HE Yunbin, XIAO Yupeng, WAN Jing, LI Song. Improved K-means algorithm based on expectation of density and clustering validity index [J]. Computer Engineering and Applications, 2013, 49(24): 105-111.
[12]	GUO Qiang, YANG Lei, ZHAO Xiangang, FENG Xiaohu, LIN Weixia, ZHANG Zhiqing, WEI Caiying. Research and optimization of landmark matching algorithm for meteorological satellite image navigation [J]. Computer Engineering and Applications, 2013, 49(24): 152-156.
[13]	LI Zhengbing1，2, LUO Bin1，2, ZHAI Sulan1，3，4, TU Zhengzheng1，4. Kmeans algorithm based on partition of correlational graph [J]. Computer Engineering and Applications, 2013, 49(21): 141-144.
[14]	FENG Bo, HAO Wenning, CHEN Gang, ZHAN Donghui. Optimization to K-means initial cluster centers [J]. Computer Engineering and Applications, 2013, 49(14): 182-185.
[15]	ZHANG Duan¹，LIU Yuan^1，2，HAO Jian-dong¹. Network anomaly detection based on fuzzy clustering algorithm and QPSO algorithm in mobile Ad Hoc [J]. Computer Engineering and Applications, 2010, 46(30): 92-94.

K-means Clustering Algorithm of Optimizing Initial Clustering Center

优化初始聚类中心的K-means聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics