局部迭代的快速K-means聚类算法

doi:10.3778/j.issn.1002-8331.1906-0070

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (13): 63-71.DOI: 10.3778/j.issn.1002-8331.1906-0070

局部迭代的快速K-means聚类算法

李峰，李明祥，张宇敬

1.河北金融学院信息管理与工程系，河北保定 071051
2.河北省高校智慧金融应用技术研发中心，河北保定 071051

出版日期:2020-07-01 发布日期:2020-07-02

Partial Iterative Fast K-means Clustering Algorithm

LI Feng, LI Mingxiang, ZHANG Yujing

1.Information Management and Engineering Department, Hebei Finance University, Baoding, Hebei 071051, China
2.Applied Technology Research and Development Center Wisdom Finance in Hebei University, Baoding, Hebei 071051, China

Online:2020-07-01 Published:2020-07-02

摘要/Abstract

摘要：

为了解决K-means算法在聚类数量增多的情况下，因选择了不合适的中心初值而影响到聚类效果这一问题，提出了一种局部迭代的快速K-means聚类算法（PIFKM+?）。该算法在K-means聚类的基础上，不断寻找能够被分割的聚类簇和能够被删除的聚类簇，并对受影响的局部数据进行重新聚类处理，降低了整个聚类更新的时间复杂度，提高了聚类的效果。PIFKM+?算法在面对聚类数量众多的情况下，具有能够快速更新聚类、对聚类中心初值不敏感、能够提高聚类精确度等优势。通过与K-means和K-means++两种算法的比较，在仿真数据集和真实数据集的综合实验下，验证了该算法的精确性、高效率性和可扩展性，同时实验结果的统计分析表明该算法在提高了聚类精确度的同时并没有损失太多的时间效率。

关键词: K-means算法, 聚类分割, 聚类删除, 局部迭代聚类, 聚类邻居

Abstract:

The K-means algorithm is one of the most popular and widely spread clustering methods. But it is not always possibly to find the appropriate initial value of the cluster centers, especially when the number of clusters is increased. If it can’t find suitable initial values, that will affect the clustering effect. This paper proposes an iterative approach to improve the quality of the clustering. This method called Partial Iterative Fast K-means plus-minus（PIFKM+?）. Based on the K-means clustering, the algorithm divides a cluster and removes another one, then re-clusters the affected data, in each iteration. The algorithm reduces the time complexity and improves the effect of clustering. The proposed method has the advantages of being able to update clusters quickly, is insensitive to initial values of cluster centers, and can improve clustering accuracy in the face of a large number of clusters. By comparing with the K-means and K-means++, experimental results vividly demonstrate that the algorithm has better clustering effect, higher operating efficiency and scalability on the simulation data sets and the real data sets. Through the statistical analysis of the final experimental results, it is shown that the PIFKM+? algorithm does not lose too much time efficiency while improving clustering accuracy.

Key words: K-means algorithm, cluster segmentation, cluster removing, partial iterative clustering, cluster neighbor

李峰，李明祥，张宇敬. 局部迭代的快速K-means聚类算法[J]. 计算机工程与应用, 2020, 56(13): 63-71.

LI Feng, LI Mingxiang, ZHANG Yujing. Partial Iterative Fast K-means Clustering Algorithm[J]. Computer Engineering and Applications, 2020, 56(13): 63-71.

[1]	潘成胜，张斌，吕亚娜，杜秀丽，邱少明. 改进灰狼优化算法的K-Means文本聚类[J]. 计算机工程与应用, 2021, 57(1): 188-193.
[2]	王子龙，李进，宋亚飞. 基于距离和权重改进的K-means算法[J]. 计算机工程与应用, 2020, 56(23): 87-94.
[3]	张震，李浩方，李孟州. YOLO算法在安检异常图像中的研究[J]. 计算机工程与应用, 2020, 56(21): 187-193.
[4]	马菁1，2，李力3. RDD上扩展索引层优化的分布式K-means算法[J]. 计算机工程与应用, 2019, 55(1): 161-167.
[5]	向程谕，王冬丽，周彦，李雅芳. 基于RGB-D融合特征的图像分类[J]. 计算机工程与应用, 2018, 54(8): 178-182.
[6]	陈庆虎，周小丹，鄢煜尘. 基于字符图像分割的打印文件识别方法[J]. 计算机工程与应用, 2018, 54(7): 170-175.
[7]	王彬宇1，刘文芬2，胡学先1，魏江宏1. 基于余弦距离选取初始簇中心的文本聚类研究[J]. 计算机工程与应用, 2018, 54(10): 11-18.
[8]	白树仁1，2，陈龙2. 自适应K值的粒子群聚类算法[J]. 计算机工程与应用, 2017, 53(16): 116-120.
[9]	邱云飞，赵彬，林明明，王伟. 结合语义改进的K-means短文本聚类算法[J]. 计算机工程与应用, 2016, 52(19): 78-83.
[10]	欧慧，夏卓群，武志伟. 基于改进流形距离的粗糙集k-means聚类算法[J]. 计算机工程与应用, 2016, 52(14): 84-89.
[11]	何云斌1，刘雪娇1，王知强2，万静1，李松1. 基于全局中心的高密度不唯一的K-means算法研究[J]. 计算机工程与应用, 2016, 52(1): 48-54.
[12]	党小超1，2，毛鹏鑫1，郝占军1，2. 基于快速求解高斯混合模型的流量聚类算法[J]. 计算机工程与应用, 2015, 51(8): 96-101.
[13]	陈强业1，2，李际军1. 基于方向约束的对称距离聚类算法[J]. 计算机工程与应用, 2015, 51(20): 120-125.
[14]	唐立力. 基于信息熵与动态聚类的文本特征选择方法[J]. 计算机工程与应用, 2015, 51(19): 152-157.
[15]	肖娟，王嵩，张雯雰. 基于聚类分割和纹理合成的图像修复改进算法[J]. 计算机工程与应用, 2014, 50(8): 131-135.

局部迭代的快速K-means聚类算法

Partial Iterative Fast K-means Clustering Algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics