Optimizing k-means initial clustering centers by minimizing sum of squared error

doi:10.3778/j.issn.1002-8331.1706-0223

Abstract

Abstract: Traditional k-means algorithm is sensitive to initial clustering centers and isolated points, based on the principal of minimizing the sum of squared error to the most extent, an optimized k-means method is presented on selecting initial clustering centers. At the phase of initial selecting clustering centers, when adding a clustering point each time, compute reduced sum of squared error of each point and select the point that can maximize the square of the reduced error. Using real datasets and compared with the results of other algorithms, the experimental results show the number of iteration is reduced on selecting initial clustering centers and the quality of clustering is improved. Besides, artificial dataset demonstrates the method is much less sensitive to isolated points.

Key words: clustering, k-means algorithm, sum of squared error, isolated points

摘要： 传统的k-均值算法对初始聚类中心和孤立点敏感，文中以最大程度地减少误差平方和为基本思想，提出一种最大化减少当前误差平方和的k-means初始聚类中心优化方法。在初始聚类中心选择阶段，每次增加聚类中心时，计算所有数据点作为当前聚类中心能够减少的误差平方和，选择能够最大化减少误差平方和的数据点作为聚类初始中心。利用真实数据集，同其他算法进行对比，实验结果表明该方法在选择初始聚类中心方面能够有效地减少聚类的迭代次数，提高聚类质量。同时人工模拟数据表明该方法对孤立点相对不敏感。

关键词: 聚类, k-均值算法, 误差平方和, 孤立点

ZHOU Benjin, TAO Yizheng, JI Bin, XIE Yonghui. Optimizing k-means initial clustering centers by minimizing sum of squared error[J]. Computer Engineering and Applications, 2018, 54(15): 48-52.

周本金，陶以政，纪斌，谢永辉. 最小化误差平方和k-means初始聚类中心优化方法[J]. 计算机工程与应用, 2018, 54(15): 48-52.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	GUO Xiaojing, SUI Haoda. Application of Improved YOLOv3 in Foreign Object Debris Target Detection on Airfield Pavement [J]. Computer Engineering and Applications, 2021, 57(8): 249-255.
[3]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[4]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[5]	YANG Fang, YIN Xi, SI Jianhui, LIU Hongyuan, WANG Xue. Mathematical Expression Similarity Calculation Method Based on Focus Clustering [J]. Computer Engineering and Applications, 2021, 57(6): 88-93.
[6]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[7]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[8]	LI Yongzhen, LIAO Husheng. Multi-view Clustering via Graph Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(5): 115-122.
[9]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[10]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[11]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[12]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[13]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.
[14]	ZHANG Zhonglin, ZHAO Yu, YAN Guanghui. Natural Neighbor Density Extremum Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(23): 200-210.
[15]	MEI Jie, WEI Yuanyuan, XU Taosheng. Fusion Clustering Algorithm Based on Multi-Prototypes Using Density Peaks [J]. Computer Engineering and Applications, 2021, 57(22): 78-85.

Optimizing k-means initial clustering centers by minimizing sum of squared error

最小化误差平方和k-means初始聚类中心优化方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics