Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (15): 172-178.

K-means Clustering Algorithm of Optimizing Initial Clustering Center

GUO Yongkun, ZHANG Xinyou, LIU Liping, DING Liang, NIU Xiaolu

1. 1.School of Computing, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China
2.School of Pharmacy, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China
• Online:2020-08-01 Published:2020-07-30

优化初始聚类中心的K-means聚类算法

1. 1.江西中医药大学 计算机学院，南昌 330004
2.江西中医药大学 药学院，南昌 330004

Abstract:

An improved K-means clustering algorithm is proposed to solve the problem that traditional K-means algorithm is very sensitive to the initial center and the clustering result is unstable. The algorithm calculates the distance between samples, then finds the nearest two points to form a set according to the distance between samples. The algorithm finds all other nearest points to the set according to the calculation formula of points and sets until the number of data points in the set is greater or equal to [α]（[α] is the ratio of the number of data points in the sample set to the number of clusters in the cluster）, while the set is deleted from the sample set. The steps above are repeated, and K（K is the number of clusters） sets are obtained. The mean of each set is calculated as the initial center, and then the final clustering results are obtained according to K-means algorithm. In Wine, Hayes-Roth, Iris, Tae, Heart-stalog, Ionosphere and Haberman datasets, the improved algorithm designed in this study is more stable than the traditional K-means and K-means++ clustering results. In Wine, Iris and Tae datasets, the improved algorithm has higher clustering accuracy than the K-means algorithm which optimizes the initial clustering center with minimum variance, and the contour coefficients and F1 values obtained by the improved algorithm are the largest in seven sets of data. For data sets with large density differences, the improved clustering algorithm designed in this study is more stable and accurate than the traditional K-means and K-means++ algorithms, and the improved clustering algorithm is more efficient than the K-means algorithm which optimizes the initial clustering center with minimum variance.