优化初始聚类中心的K-means聚类算法

doi:10.3778/j.issn.1002-8331.1910-0220

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (15): 172-178.DOI: 10.3778/j.issn.1002-8331.1910-0220

优化初始聚类中心的K-means聚类算法

郭永坤，章新友，刘莉萍，丁亮，牛晓录

1.江西中医药大学计算机学院，南昌 330004
2.江西中医药大学药学院，南昌 330004

出版日期:2020-08-01 发布日期:2020-07-30

K-means Clustering Algorithm of Optimizing Initial Clustering Center

GUO Yongkun, ZHANG Xinyou, LIU Liping, DING Liang, NIU Xiaolu

1.School of Computing, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China
2.School of Pharmacy, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China

Online:2020-08-01 Published:2020-07-30

摘要/Abstract

摘要：

针对传统K-means算法对初始中心十分敏感，聚类结果不稳定问题，提出了一种改进K-means聚类算法。该算法首先计算样本间的距离，根据样本距离找出距离最近的两点形成集合，根据点与集合的计算公式找出其他所有离集合最近的点，直到集合内数据数目大于或等于[α]（[α]为样本集数据点数目与聚类的簇类数目的比值），再把该集合从样本集中删除，重复以上步骤得到K（K为簇类数目）个集合，计算每个集合的均值作为初始中心，并根据K-means算法得到最终的聚类结果。在Wine、Hayes-Roth、Iris、Tae、Heart-stalog、Ionosphere、Haberman数据集中，改进算法比传统K-means、K-means++算法的聚类结果更稳定；在Wine、Iris、Tae数据集中，比最小方差优化初始聚类中心的K-means算法聚类准确率更高，且在7组数据集中改进算法得到的轮廓系数和F1值最大。对于密度差异较大数据集，聚类结果比传统K-means、K-means++算法更稳定，更准确，且比最小方差优化初始聚类中心的K-means算法更高效。

关键词: K-means聚类算法, 算法优化, 初始聚类中心

Abstract:

An improved K-means clustering algorithm is proposed to solve the problem that traditional K-means algorithm is very sensitive to the initial center and the clustering result is unstable. The algorithm calculates the distance between samples, then finds the nearest two points to form a set according to the distance between samples. The algorithm finds all other nearest points to the set according to the calculation formula of points and sets until the number of data points in the set is greater or equal to [α]（[α] is the ratio of the number of data points in the sample set to the number of clusters in the cluster）, while the set is deleted from the sample set. The steps above are repeated, and K（K is the number of clusters） sets are obtained. The mean of each set is calculated as the initial center, and then the final clustering results are obtained according to K-means algorithm. In Wine, Hayes-Roth, Iris, Tae, Heart-stalog, Ionosphere and Haberman datasets, the improved algorithm designed in this study is more stable than the traditional K-means and K-means++ clustering results. In Wine, Iris and Tae datasets, the improved algorithm has higher clustering accuracy than the K-means algorithm which optimizes the initial clustering center with minimum variance, and the contour coefficients and F1 values obtained by the improved algorithm are the largest in seven sets of data. For data sets with large density differences, the improved clustering algorithm designed in this study is more stable and accurate than the traditional K-means and K-means++ algorithms, and the improved clustering algorithm is more efficient than the K-means algorithm which optimizes the initial clustering center with minimum variance.

Key words: K-means clustering algorithm, algorithm optimization, initial clustering center

郭永坤，章新友，刘莉萍，丁亮，牛晓录. 优化初始聚类中心的K-means聚类算法[J]. 计算机工程与应用, 2020, 56(15): 172-178.

GUO Yongkun, ZHANG Xinyou, LIU Liping, DING Liang, NIU Xiaolu. K-means Clustering Algorithm of Optimizing Initial Clustering Center[J]. Computer Engineering and Applications, 2020, 56(15): 172-178.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	张子然，黄卫华，陈阳，章政，李梓远. 基于双向搜索的改进蚁群路径规划算法[J]. 计算机工程与应用, 2021, 57(21): 270-277.
[3]	马原东，罗子江，倪照风，徐斌，吴凤娇，孙收余，杨秀璋. 改进SSD算法的多目标检测[J]. 计算机工程与应用, 2020, 56(23): 23-30.
[4]	王子龙，李进，宋亚飞. 基于距离和权重改进的K-means算法[J]. 计算机工程与应用, 2020, 56(23): 87-94.
[5]	崔芳怡1，荆晓远2，董西伟2，3，吴飞2，孙莹2. 自适应蝙蝠算法优化的模糊聚类及其应用[J]. 计算机工程与应用, 2019, 55(7): 16-22.
[6]	董本志，聂丽郦，景维鹏，崔航. 基于Faster R-CNN的榆紫叶甲虫识别方法研究[J]. 计算机工程与应用, 2018, 54(23): 89-93.
[7]	李帆1，高东1，许欣2，张玉良2. 改进蝙蝠算法柔性作业车间调度问题研究[J]. 计算机工程与应用, 2018, 54(21): 265-270.
[8]	刘玉珍，王兆锋. 基于DV-HOP改进的无线传感器网络定位算法[J]. 计算机工程与应用, 2016, 52(4): 79-83.
[9]	王实1，2，涂卫平1，2. AVS-P10音频解码器算法分析与定点化[J]. 计算机工程与应用, 2016, 52(18): 92-97.
[10]	董晓芬1，2，张伟1，庞明勇1，2. 空间剖分树形查找结构的效率分析[J]. 计算机工程与应用, 2016, 52(15): 73-78.
[11]	董丽丽，董玮，张翔. 利用CUDA提高内存数据聚类效能的研究[J]. 计算机工程与应用, 2015, 51(22): 243-251.
[12]	朱明，陆小锋，陆亨立，李莹娇，张东升. AdaBoost人脸检测算法在DSP上的移植与优化[J]. 计算机工程与应用, 2014, 50(20): 197-201.
[13]	何云斌，肖宇鹏，万静，李松. 基于密度期望和有效性指标的K-均值算法[J]. 计算机工程与应用, 2013, 49(24): 105-111.
[14]	郭强，杨磊，赵现纲，冯小虎，林维夏，张志清，魏彩英. 气象卫星图像导航的地标匹配算法研究与优化[J]. 计算机工程与应用, 2013, 49(24): 152-156.
[15]	李正兵1，2，罗斌1，2，翟素兰1，3，4，涂铮铮1，4. 基于关联图划分的Kmeans算法[J]. 计算机工程与应用, 2013, 49(21): 141-144.

优化初始聚类中心的K-means聚类算法

K-means Clustering Algorithm of Optimizing Initial Clustering Center

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics