最小化误差平方和k-means初始聚类中心优化方法

doi:10.3778/j.issn.1002-8331.1706-0223

计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (15): 48-52.DOI: 10.3778/j.issn.1002-8331.1706-0223

最小化误差平方和k-means初始聚类中心优化方法

周本金，陶以政，纪斌，谢永辉

中国工程物理研究院计算机应用研究所，四川绵阳 621900

出版日期:2018-08-01 发布日期:2018-07-26

Optimizing k-means initial clustering centers by minimizing sum of squared error

ZHOU Benjin, TAO Yizheng, JI Bin, XIE Yonghui

Institute of Computer Application, China Academy of Engineering Physics, Mianyang, Sichuan 621900, China

Online:2018-08-01 Published:2018-07-26

摘要/Abstract

摘要： 传统的k-均值算法对初始聚类中心和孤立点敏感，文中以最大程度地减少误差平方和为基本思想，提出一种最大化减少当前误差平方和的k-means初始聚类中心优化方法。在初始聚类中心选择阶段，每次增加聚类中心时，计算所有数据点作为当前聚类中心能够减少的误差平方和，选择能够最大化减少误差平方和的数据点作为聚类初始中心。利用真实数据集，同其他算法进行对比，实验结果表明该方法在选择初始聚类中心方面能够有效地减少聚类的迭代次数，提高聚类质量。同时人工模拟数据表明该方法对孤立点相对不敏感。

关键词: 聚类, k-均值算法, 误差平方和, 孤立点

Abstract: Traditional k-means algorithm is sensitive to initial clustering centers and isolated points, based on the principal of minimizing the sum of squared error to the most extent, an optimized k-means method is presented on selecting initial clustering centers. At the phase of initial selecting clustering centers, when adding a clustering point each time, compute reduced sum of squared error of each point and select the point that can maximize the square of the reduced error. Using real datasets and compared with the results of other algorithms, the experimental results show the number of iteration is reduced on selecting initial clustering centers and the quality of clustering is improved. Besides, artificial dataset demonstrates the method is much less sensitive to isolated points.

Key words: clustering, k-means algorithm, sum of squared error, isolated points

周本金，陶以政，纪斌，谢永辉. 最小化误差平方和k-means初始聚类中心优化方法[J]. 计算机工程与应用, 2018, 54(15): 48-52.

ZHOU Benjin, TAO Yizheng, JI Bin, XIE Yonghui. Optimizing k-means initial clustering centers by minimizing sum of squared error[J]. Computer Engineering and Applications, 2018, 54(15): 48-52.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[3]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[4]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[5]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[6]	赵凡，张琳，闻治泉，杨林林，蔺广逢. 一种直接高效的自然场景汉字逼近定位方法[J]. 计算机工程与应用, 2021, 57(6): 159-167.
[7]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[8]	李勇振，廖湖声. 基于图卷积神经网络的多视角聚类[J]. 计算机工程与应用, 2021, 57(5): 115-122.
[9]	王昌龙，张远东，缪宏，杨煜恒. 双通道卷积神经网络在南瓜病害识别上的应用[J]. 计算机工程与应用, 2021, 57(5): 183-189.
[10]	胡晓敏，王明丰，张首荣，李敏. 用于文本聚类的新型差分进化粒子群算法[J]. 计算机工程与应用, 2021, 57(4): 61-67.
[11]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[12]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[13]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.
[14]	张忠林，赵昱，闫光辉. 自然邻居密度极值聚类算法[J]. 计算机工程与应用, 2021, 57(23): 200-210.
[15]	梅婕，魏圆圆，许桃胜. 基于密度峰值多起始中心的融合聚类算法[J]. 计算机工程与应用, 2021, 57(22): 78-85.

最小化误差平方和k-means初始聚类中心优化方法

Optimizing k-means initial clustering centers by minimizing sum of squared error

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics