Improved k-means initialization method based on data density

Abstract

Abstract: K-means is a widely used clustering method in many communities. However, the initial procedure affects the clustering results seriously, especially the initial centroids. Reasonable initial centroids should be in the region with high data density, so an improved k-means initialization method is proposed based on local data density. Firstly, a definition of local data density function is given, and then initial centroids are chosen based on this definition. Experimental result shows that the proposed method has several advantages： it can find the data densities effective and the reasonable candidates of initial centroids, it shows outstanding performance when the number of categories is related large, it is robust to outliers and noisy, it is easy to implement.

Key words: clustering, k-means, initialization, data density

摘要： K均值算法虽被广泛应用，但其算法性能和算法稳定性严重依赖算法的初始化过程，尤其是初始聚类中心的选取。比较合理的聚类中心应该出现在数据密集的区域，基于这个假设，提出了一种依赖数据局部密度的初始化调优算法。该算法以数据的局部密度函数为依据，并在高密度区域选取初始聚类中心。与同类算法相比，该算法有如下特点：能够自主发现数据集中数据分布的局部密集度；对类别数目较多的数据表现出更好的性能；对离群点和噪声鲁棒；易于实现。

关键词: 聚类, K均值算法, 聚类中心, 密度函数

SHEN Guozhen. Improved k-means initialization method based on data density[J]. Computer Engineering and Applications, 2014, 50(11): 139-144.

沈国珍. 依赖数据密度的K均值初始化调优[J]. 计算机工程与应用, 2014, 50(11): 139-144.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	GUO Xiaojing, SUI Haoda. Application of Improved YOLOv3 in Foreign Object Debris Target Detection on Airfield Pavement [J]. Computer Engineering and Applications, 2021, 57(8): 249-255.
[3]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[4]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[5]	YANG Fang, YIN Xi, SI Jianhui, LIU Hongyuan, WANG Xue. Mathematical Expression Similarity Calculation Method Based on Focus Clustering [J]. Computer Engineering and Applications, 2021, 57(6): 88-93.
[6]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[7]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[8]	LI Yongzhen, LIAO Husheng. Multi-view Clustering via Graph Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(5): 115-122.
[9]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[10]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[11]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[12]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[13]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.
[14]	ZHANG Zhonglin, ZHAO Yu, YAN Guanghui. Natural Neighbor Density Extremum Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(23): 200-210.
[15]	MEI Jie, WEI Yuanyuan, XU Taosheng. Fusion Clustering Algorithm Based on Multi-Prototypes Using Density Peaks [J]. Computer Engineering and Applications, 2021, 57(22): 78-85.

Improved k-means initialization method based on data density

依赖数据密度的K均值初始化调优

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics