Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (23): 7-14.DOI: 10.3778/j.issn.1002-8331.1908-0347

Previous Articles     Next Articles

Survey on K-Means Clustering Algorithm

YANG Junchuang, ZHAO Chao   

  1. College of Information and Electrical Engineering, Hebei University of Engineering, Handan, Hebei 056038, China
  • Online:2019-12-01 Published:2019-12-11



  1. 河北工程大学 信息与电气工程学院,河北 邯郸 056038

Abstract: The K-Means algorithm is a partition-based algorithm in cluster analysis. With an unsupervised learning algorithm, its advantages of simple thinking, good effect and easy implementation are widely used in fields such as machine learning. But the K-Means algorithm also has certain limitations. For example, the K number of clusters in the algorithm is difficult to determine how to choose the initial cluster center, how to detect and remove outliers and the distance and similarity measure. This paper summarizes the improvement of K-Means algorithm from several aspects, and compares it with the classical K-Means algorithm. In addition, it analyzes the advantages and disadvantages of the improved algorithm, and points out the problems. Finally, the development direction and trend of K-Means algorithm are prospected.

Key words: K-Means, clustering algorithm, cluster center, outliers

摘要: K-均值(K-Means)算法是聚类分析中一种基于划分的算法,同时也是无监督学习算法。其具有思想简单、效果好和容易实现的优点,广泛应用于机器学习等领域。但是K-Means算法也有一定的局限性,比如:算法中聚类数目K值难以确定,初始聚类中心如何选取,离群点的检测与去除,距离和相似性度量等。从多个方面对K-Means算法的改进措施进行概括,并和传统K-Means算法进行比较,分析了改进算法的优缺点,指出了其中存在的问题。对K-Means算法的发展方向和趋势进行了展望。

关键词: K-Means, 聚类算法, 聚类中心, 离群点