基于平均密度优化初始聚类中心的k-means算法

计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (20): 135-138.

• 数据库、数据挖掘、机器学习 • 上一篇下一篇

基于平均密度优化初始聚类中心的k-means算法

邢长征，谷浩

辽宁工程技术大学电子与信息工程学院，辽宁葫芦岛 125105

出版日期:2014-10-15 发布日期:2014-10-28

K-means algorithm based on average density optimizing initial cluster centre

XING Changzheng, GU Hao

School of Electronic and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125105, China

Online:2014-10-15 Published:2014-10-28

摘要/Abstract

摘要： 现有的基于密度优化初始聚类中心的k-means算法存在聚类中心的搜索范围大、消耗时间久以及聚类结果对孤立点敏感等问题，针对这些问题，提出了一种基于平均密度优化初始聚类中心的k-means算法adk-means。该算法将数据集中的孤立点划分出来，计算出剩余数据集样本的平均密度，孤立点不参与聚类过程中各类所含样本均值的计算；在大于平均密度的密度参数集合中选择聚类中心，根据最小距离原则将孤立点分配给离它最近的聚类中心，直至将数据集完整分类。实验结果表明，这种基于平均密度优化初始聚类中心的k-means算法比现有的基于密度的k-means算法有更快的收敛速度，更强的稳定性及更高的聚类精度，消除了聚类结果对孤立点的敏感性。

关键词: k-means算法, 聚类中心, 平均密度, 孤立点, 收敛

Abstract: The existing k-means algorithms based on the density optimization are of the large search range, long time-consuming, and the clustering results are sensitive to isolated points. A k-means algorithm based on the average density optimizing the initial cluster centre, adk-means, is proposed to solve these problems. The isolated points are divided out from data set, and the average density of the remaining sample of data set is calculated out without involving of the isolated points. The isolated points are also ignored in the calculation of all other kinds of sample average in the process of clustering. Then it selects the centre of cluster from the density parameter set whose density is greater than the average density. The isolated point is assigned to the nearest cluster centre according to the principle of minimum distance, until the clustering is completely done. The experimental results show that, the average density based K-means algorithm of optimal initial clustering centre has faster convergence speed, better stability and higher clustering accuracy than the existing density based k-means algorithm, and eliminates the problem that the clustering results are sensitive to isolated points.

Key words: k-means algorithm, clustering centre, average density, isolated points, convergence

邢长征，谷浩. 基于平均密度优化初始聚类中心的k-means算法[J]. 计算机工程与应用, 2014, 50(20): 135-138.

XING Changzheng, GU Hao. K-means algorithm based on average density optimizing initial cluster centre[J]. Computer Engineering and Applications, 2014, 50(20): 135-138.

[1]	张松灿，普杰信，司彦娜，孙力帆. 基于种群相似度的自适应改进蚁群算法及应用[J]. 计算机工程与应用, 2021, 57(8): 70-77.
[2]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[3]	陈雷，尹钧圣. 高斯差分变异和对数惯性权重优化的鲸群算法[J]. 计算机工程与应用, 2021, 57(2): 77-90.
[4]	潘成胜，张斌，吕亚娜，杜秀丽，邱少明. 改进灰狼优化算法的K-Means文本聚类[J]. 计算机工程与应用, 2021, 57(1): 188-193.
[5]	黎素涵，叶春明. 重选精英个体的非线性收敛灰狼优化算法[J]. 计算机工程与应用, 2021, 57(1): 62-68.
[6]	贾雁飞，杜艳丽，赵立权. 快速收敛参考独立分量分析方法[J]. 计算机工程与应用, 2020, 56(7): 255-259.
[7]	宋杰，许冰，杨淼中. 基于自适应步长果蝇优化算法图像分割[J]. 计算机工程与应用, 2020, 56(4): 184-190.
[8]	王子龙，李进，宋亚飞. 基于距离和权重改进的K-means算法[J]. 计算机工程与应用, 2020, 56(23): 87-94.
[9]	张震，李浩方，李孟州. YOLO算法在安检异常图像中的研究[J]. 计算机工程与应用, 2020, 56(21): 187-193.
[10]	吉训生，蔡益青. 利用历史信息和限制算子求解MOFJSP[J]. 计算机工程与应用, 2020, 56(2): 272-278.
[11]	张晓莉，杨亚新，谢永成. 改进的蚁群算法在机器人路径规划上的应用[J]. 计算机工程与应用, 2020, 56(2): 29-34.
[12]	马倩茹，冶继民. FastICA算法的收敛性与一致性分析[J]. 计算机工程与应用, 2020, 56(2): 35-41.
[13]	郭永坤，章新友，刘莉萍，丁亮，牛晓录. 优化初始聚类中心的K-means聚类算法[J]. 计算机工程与应用, 2020, 56(15): 172-178.
[14]	李峰，李明祥，张宇敬. 局部迭代的快速K-means聚类算法[J]. 计算机工程与应用, 2020, 56(13): 63-71.
[15]	黄建新，袁杰. 三维空间机器人主动嗅觉烟羽源自主定位策略[J]. 计算机工程与应用, 2020, 56(12): 223-230.

基于平均密度优化初始聚类中心的k-means算法

K-means algorithm based on average density optimizing initial cluster centre

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics