Partial Iterative Fast K-means Clustering Algorithm

doi:10.3778/j.issn.1002-8331.1906-0070

Abstract

Abstract:

The K-means algorithm is one of the most popular and widely spread clustering methods. But it is not always possibly to find the appropriate initial value of the cluster centers, especially when the number of clusters is increased. If it can’t find suitable initial values, that will affect the clustering effect. This paper proposes an iterative approach to improve the quality of the clustering. This method called Partial Iterative Fast K-means plus-minus（PIFKM+?）. Based on the K-means clustering, the algorithm divides a cluster and removes another one, then re-clusters the affected data, in each iteration. The algorithm reduces the time complexity and improves the effect of clustering. The proposed method has the advantages of being able to update clusters quickly, is insensitive to initial values of cluster centers, and can improve clustering accuracy in the face of a large number of clusters. By comparing with the K-means and K-means++, experimental results vividly demonstrate that the algorithm has better clustering effect, higher operating efficiency and scalability on the simulation data sets and the real data sets. Through the statistical analysis of the final experimental results, it is shown that the PIFKM+? algorithm does not lose too much time efficiency while improving clustering accuracy.

Key words: K-means algorithm, cluster segmentation, cluster removing, partial iterative clustering, cluster neighbor

摘要：

为了解决K-means算法在聚类数量增多的情况下，因选择了不合适的中心初值而影响到聚类效果这一问题，提出了一种局部迭代的快速K-means聚类算法（PIFKM+?）。该算法在K-means聚类的基础上，不断寻找能够被分割的聚类簇和能够被删除的聚类簇，并对受影响的局部数据进行重新聚类处理，降低了整个聚类更新的时间复杂度，提高了聚类的效果。PIFKM+?算法在面对聚类数量众多的情况下，具有能够快速更新聚类、对聚类中心初值不敏感、能够提高聚类精确度等优势。通过与K-means和K-means++两种算法的比较，在仿真数据集和真实数据集的综合实验下，验证了该算法的精确性、高效率性和可扩展性，同时实验结果的统计分析表明该算法在提高了聚类精确度的同时并没有损失太多的时间效率。

关键词: K-means算法, 聚类分割, 聚类删除, 局部迭代聚类, 聚类邻居

LI Feng, LI Mingxiang, ZHANG Yujing. Partial Iterative Fast K-means Clustering Algorithm[J]. Computer Engineering and Applications, 2020, 56(13): 63-71.

李峰，李明祥，张宇敬. 局部迭代的快速K-means聚类算法[J]. 计算机工程与应用, 2020, 56(13): 63-71.

[1]	PAN Chengsheng, ZHANG Bin, LYU Yana, DU Xiuli, QIU Shaoming. K-Means Text Clustering Based on Improved Gray Wolf Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(1): 188-193.
[2]	WANG Zilong, LI Jin, SONG Yafei. Improved K-means Algorithm Based on Distance and Weight [J]. Computer Engineering and Applications, 2020, 56(23): 87-94.
[3]	ZHANG Zhen, LI Haofang, LI Mengzhou. Research on YOLO Algorithm in Abnormal Security Images [J]. Computer Engineering and Applications, 2020, 56(21): 187-193.
[4]	WANG Jianren, MA Xin, DUAN Ganglong. Improved K-means Clustering k-Value Selection Algorithm [J]. Computer Engineering and Applications, 2019, 55(8): 27-33.
[5]	CHEN Qinghu, ZHOU Xiaodan, YAN Yuchen. Recognition of print file based on character image segmentation [J]. Computer Engineering and Applications, 2018, 54(7): 170-175.
[6]	ZHOU Benjin, TAO Yizheng, JI Bin, XIE Yonghui. Optimizing k-means initial clustering centers by minimizing sum of squared error [J]. Computer Engineering and Applications, 2018, 54(15): 48-52.
[7]	WANG Binyu1, LIU Wenfen2, HU Xuexian1, WEI Jianghong1. Research on text clustering for selecting initial cluster center based on Cosine distance [J]. Computer Engineering and Applications, 2018, 54(10): 11-18.
[8]	WANG Zhaofeng, SHAN Ganlin . k-means based method for dynamically selecting DBSCAN algorithm parameters [J]. Computer Engineering and Applications, 2017, 53(3): 80-86.
[9]	BAI Shuren1，2, CHEN Long2. Particle clustering algorithm with adaptive K values [J]. Computer Engineering and Applications, 2017, 53(16): 116-120.
[10]	QIU Yunfei, ZHAO Bin, LIN Mingming, WANG Wei. Improved K-means clustering algorithm combined semantic similarity of short text [J]. Computer Engineering and Applications, 2016, 52(19): 78-83.
[11]	OU Hui, XIA Zhuoqun, WU Zhiwei. Rough k-means clustering algorithm based on improved manifold distance [J]. Computer Engineering and Applications, 2016, 52(14): 84-89.
[12]	HE Yunbin1, LIU Xuejiao1, WANG Zhiqiang2, WAN Jing1, LI Song1. Improved K-means algorithm based on global center and nonuniqueness high-density points [J]. Computer Engineering and Applications, 2016, 52(1): 48-54.
[13]	DANG Xiaochao1，2, MAO Pengxin1, HAO Zhanjun1，2. Network traffic clustering algorithm based on quick solution of GMM [J]. Computer Engineering and Applications, 2015, 51(8): 96-101.
[14]	DONG Lili, DONG Wei, ZHANG Xiang. Research for memory data clustering efficiency with CUDA [J]. Computer Engineering and Applications, 2015, 51(22): 243-251.
[15]	CHEN Qiangye1，2, LI Jijun1. Clustering algorithm based on symmetry distance with direction constraint [J]. Computer Engineering and Applications, 2015, 51(20): 120-125.

Partial Iterative Fast K-means Clustering Algorithm

局部迭代的快速K-means聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics