基于聚类准则函数的改进K-means算法

计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (11): 123-127.

• 数据库、信号与信息处理 • 上一篇下一篇

基于聚类准则函数的改进K-means算法

张雪凤1，张桂珍2，刘鹏1，2

1.上海财经大学信息管理与工程学院，上海 200433
2.上海财经大学继续教育学院，上海 200080

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-04-11 发布日期:2011-04-11

Improved K-means algorithm based on clustering criterion function

ZHANG Xuefeng1，ZHANG Guizhen2，LIU Peng1，2

1.School of Information Management and Engineering，Shanghai University of Finance & Economics，Shanghai 200433，China
2.School of Continuing Education，Shanghai University of Finance & Economics，Shanghai 200080，China

Received:1900-01-01 Revised:1900-01-01 Online:2011-04-11 Published:2011-04-11

摘要/Abstract

摘要： K-means算法所使用的聚类准则函数是将数据集中各个簇的误差平方值直接相加而得到的，不能有效处理簇的密度不均且大小差异较大的数据集。为此，将K-means算法的聚类准则函数定义为加权的簇内标准差之和，权重为簇内数据对象数占总数目的比例。同时，调整了传统K-means算法将数据对象重新分配给簇的方法，采用一个数据对象到中心点的加权距离代替传统K-means算法中的距离，将数据对象分配给使加权距离最小的中心点所在的簇。实验结果表明，针对模拟数据集的聚类，改进K-means算法可以明显减少大而稀的簇中数据对象被错误地分配到相邻的小而密簇的可能性，改善了聚类的质量；针对UCI数据集的聚类，改进算法使得各个簇更为紧凑，从而验证了改进K-means算法的有效性。

关键词: K-means算法, 簇, 聚类准则函数

Abstract: The criterion function used in K-means algorithm is the sum of the squared error，which may not work well for dataset containing clusters with different sizes and densities.In this study，the criterion function is improved by being defined as the sum of the weighted standard deviation，and the weight is the ratio of the number of points in each cluster to the whole points.The way each point being assigned to the centroid in the K-means algorithm is also modified：Instead of being assigned to the closest centroid，each point is assigned to the centroid which has minimum weighted distance.Experiments on simulation datasets show that the improved K-means algorithm significantly enhances the clustering quality by reducing the probability of misclassifying the points of big sparse clusters to its neighboring compact clusters.Experiments on UCI datasets show that the improved algorithm can obtain more compact cluster.Therefore，the improved K-means algorithm is effective.

Key words: K-means algorithm, cluster, clustering criterion function

张雪凤1，张桂珍2，刘鹏1，2. 基于聚类准则函数的改进K-means算法[J]. 计算机工程与应用, 2011, 47(11): 123-127.

ZHANG Xuefeng1，ZHANG Guizhen2，LIU Peng1，2. Improved K-means algorithm based on clustering criterion function[J]. Computer Engineering and Applications, 2011, 47(11): 123-127.

[1]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.
[2]	王日宏，邢聪颖，徐泉清，袁杉杉. 具有监督机制的高效拜占庭容错算法[J]. 计算机工程与应用, 2021, 57(18): 142-148.
[3]	彭家寅. 通过五粒子信道的非对称双向量子信息传输[J]. 计算机工程与应用, 2021, 57(10): 88-93.
[4]	潘成胜，张斌，吕亚娜，杜秀丽，邱少明. 改进灰狼优化算法的K-Means文本聚类[J]. 计算机工程与应用, 2021, 57(1): 188-193.
[5]	彭家寅. 受控双向远程量子控制[J]. 计算机工程与应用, 2020, 56(9): 117-124.
[6]	王子龙，李进，宋亚飞. 基于距离和权重改进的K-means算法[J]. 计算机工程与应用, 2020, 56(23): 87-94.
[7]	张震，李浩方，李孟州. YOLO算法在安检异常图像中的研究[J]. 计算机工程与应用, 2020, 56(21): 187-193.
[8]	梁金鹏，李正权. 面向楼梯应急通信系统的随机多簇信道模型[J]. 计算机工程与应用, 2020, 56(17): 109-114.
[9]	李峰，李明祥，张宇敬. 局部迭代的快速K-means聚类算法[J]. 计算机工程与应用, 2020, 56(13): 63-71.
[10]	朱瑞金，龚雪娇，唐波. 分布式混合压缩感知无线传感器网络数据收集[J]. 计算机工程与应用, 2019, 55(6): 73-80.
[11]	张媛媛，吴华锋，鲜江峰，梅骁峻. 双重约束下的海洋无线传感网自适应成簇算法[J]. 计算机工程与应用, 2019, 55(19): 128-133.
[12]	彭家寅. 多参数测量的分层量子态分享[J]. 计算机工程与应用, 2019, 55(12): 59-66.
[13]	王慧娇，邱赞，蒋华. 基于演化博弈的无线传感器网络分簇算法[J]. 计算机工程与应用, 2019, 55(12): 97-102.
[14]	杜沛，程晓荣. 一种基于[K]近邻的比较密度峰值聚类算法[J]. 计算机工程与应用, 2019, 55(10): 161-168.
[15]	马菁1，2，李力3. RDD上扩展索引层优化的分布式K-means算法[J]. 计算机工程与应用, 2019, 55(1): 161-167.

基于聚类准则函数的改进K-means算法

Improved K-means algorithm based on clustering criterion function

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics