计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (20): 135-138.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于平均密度优化初始聚类中心的k-means算法

邢长征,谷  浩   

  1. 辽宁工程技术大学 电子与信息工程学院,辽宁 葫芦岛 125105
  • 出版日期:2014-10-15 发布日期:2014-10-28

K-means algorithm based on average density optimizing initial cluster centre

XING Changzheng, GU Hao   

  1. School of Electronic and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2014-10-15 Published:2014-10-28

摘要: 现有的基于密度优化初始聚类中心的k-means算法存在聚类中心的搜索范围大、消耗时间久以及聚类结果对孤立点敏感等问题,针对这些问题,提出了一种基于平均密度优化初始聚类中心的k-means算法adk-means。该算法将数据集中的孤立点划分出来,计算出剩余数据集样本的平均密度,孤立点不参与聚类过程中各类所含样本均值的计算;在大于平均密度的密度参数集合中选择聚类中心,根据最小距离原则将孤立点分配给离它最近的聚类中心,直至将数据集完整分类。实验结果表明,这种基于平均密度优化初始聚类中心的k-means算法比现有的基于密度的k-means算法有更快的收敛速度,更强的稳定性及更高的聚类精度,消除了聚类结果对孤立点的敏感性。

关键词: k-means算法, 聚类中心, 平均密度, 孤立点, 收敛

Abstract: The existing k-means algorithms based on the density optimization are of the large search range, long time-consuming, and the clustering results are sensitive to isolated points. A k-means algorithm based on the average density optimizing the initial cluster centre, adk-means, is proposed to solve these problems. The isolated points are divided out from data set, and the average density of the remaining sample of data set is calculated out without involving of the isolated points. The isolated points are also ignored in the calculation of all other kinds of sample average in the process of clustering. Then it selects the centre of cluster from the density parameter set whose density is greater than the average density. The isolated point is assigned to the nearest cluster centre according to the principle of minimum distance, until the clustering is completely done. The experimental results show that, the average density based K-means algorithm of optimal initial clustering centre has faster convergence speed, better stability and higher clustering accuracy than the existing density based k-means algorithm, and eliminates the problem that the clustering results are sensitive to isolated points.

Key words: k-means algorithm, clustering centre, average density, isolated points, convergence