Survey on K-Means Clustering Algorithm

doi:10.3778/j.issn.1002-8331.1908-0347

Abstract

Abstract: The K-Means algorithm is a partition-based algorithm in cluster analysis. With an unsupervised learning algorithm, its advantages of simple thinking, good effect and easy implementation are widely used in fields such as machine learning. But the K-Means algorithm also has certain limitations. For example, the K number of clusters in the algorithm is difficult to determine how to choose the initial cluster center, how to detect and remove outliers and the distance and similarity measure. This paper summarizes the improvement of K-Means algorithm from several aspects, and compares it with the classical K-Means algorithm. In addition, it analyzes the advantages and disadvantages of the improved algorithm, and points out the problems. Finally, the development direction and trend of K-Means algorithm are prospected.

Key words: K-Means, clustering algorithm, cluster center, outliers

摘要： K-均值（K-Means）算法是聚类分析中一种基于划分的算法，同时也是无监督学习算法。其具有思想简单、效果好和容易实现的优点，广泛应用于机器学习等领域。但是K-Means算法也有一定的局限性，比如：算法中聚类数目K值难以确定，初始聚类中心如何选取，离群点的检测与去除，距离和相似性度量等。从多个方面对K-Means算法的改进措施进行概括，并和传统K-Means算法进行比较，分析了改进算法的优缺点，指出了其中存在的问题。对K-Means算法的发展方向和趋势进行了展望。

关键词: K-Means, 聚类算法, 聚类中心, 离群点

YANG Junchuang, ZHAO Chao. Survey on K-Means Clustering Algorithm[J]. Computer Engineering and Applications, 2019, 55(23): 7-14.

杨俊闯，赵超. K-Means聚类算法研究综述[J]. 计算机工程与应用, 2019, 55(23): 7-14.

[1]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[2]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[3]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[4]	ZHANG Ziran, HUANG Weihua, CHEN Yang, ZHANG Zheng, LI Ziyuan. Improved Ant Colony Path Planning Algorithm Based on Bidirectional Search [J]. Computer Engineering and Applications, 2021, 57(21): 270-277.
[5]	DING Songyang, TIAN Qingyun. Density Peak Clustering Algorithm Based on Ball-Tree [J]. Computer Engineering and Applications, 2021, 57(20): 90-96.
[6]	WENG Yushang, XIAO Jinqiu, XIA Yu. Strip Surface Defect Detection Based on Improved Mask R-CNN Algorithm [J]. Computer Engineering and Applications, 2021, 57(19): 235-242.
[7]	CHENG Jingyi, DUAN Xianhua, ZHU Wei. Research on Metal Surface Defect Detection by Improved YOLOv3 [J]. Computer Engineering and Applications, 2021, 57(19): 252-258.
[8]	BAI Lu, ZHAO Xin, KONG Yuting, ZHANG Zhenghang, SHAO Jinxin, QIAN Yurong. Survey of Spectral Clustering Algorithms [J]. Computer Engineering and Applications, 2021, 57(14): 15-26.
[9]	XIANG Yixuan, JIANG He, PAN Pinchen, SUN Conghui. Study on [K]-means Clustering Algorithm of Quadratic Power Coupling [J]. Computer Engineering and Applications, 2021, 57(14): 95-102.
[10]	PAN Chengsheng, ZHANG Bin, LYU Yana, DU Xiuli, QIU Shaoming. K-Means Text Clustering Based on Improved Gray Wolf Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(1): 188-193.
[11]	GAO Weijun, SHI Yang, YANG Jie, ZHANG Chunxia. An Improved Lightweight Head Detection Method [J]. Computer Engineering and Applications, 2021, 57(1): 207-212.
[12]	LU Junjie, HUANG Jinquan, LU Feng. Likelihood K-means Clustering for Gas Path Failure Diagnostics of Turbofan Engine [J]. Computer Engineering and Applications, 2020, 56(9): 136-141.
[13]	ZONG Xiaoping, TIAN Weiqian. Segmentation and Feature Extraction of Brain Tumor Based on Magnetic Resonance Image Using K-means [J]. Computer Engineering and Applications, 2020, 56(3): 187-193.
[14]	WANG Weihong, ZENG Yingjie. Collaborative Filtering Recommendation Algorithm Based on Clustering and User Preference [J]. Computer Engineering and Applications, 2020, 56(3): 68-73.
[15]	WANG Zilong, LI Jin, SONG Yafei. Improved K-means Algorithm Based on Distance and Weight [J]. Computer Engineering and Applications, 2020, 56(23): 87-94.

Survey on K-Means Clustering Algorithm

K-Means聚类算法研究综述

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics