计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (14): 143-147.DOI: 10.3778/j.issn.1002-8331.1702-0127

• 模式识别与人工智能 • 上一篇    下一篇

基于样本密度的全局优化K均值聚类算法

薛印玺,许鸿文,李  羚   

  1. 中国地质大学(武汉) 机械与电子信息学院,武汉 430074
  • 出版日期:2018-07-15 发布日期:2018-08-06

Global optimized K-means clustering algorithm based on sample density

XUE Yinxi, XU Hongwen, LI Ling   

  1. Faculty of Mechanical & Electronic Information, China University of Geosciences, Wuhan 430074, China
  • Online:2018-07-15 Published:2018-08-06

摘要: 针对传统[K]均值聚类算法中存在的聚类结果依赖于初始聚类中心及易陷入局部最优等问题,提出一种基于样本密度的全局优化[K]均值聚类算法(KMS-GOSD)。在迭代过程中,KMS-GOSD算法首先通过高斯模型得到所有聚类中心的预估计密度,然后将实际密度低于预估计密度最大的聚类中心进行偏移操作。通过优化聚类中心位置,KMS-GOSD算法不仅能提升全局探索能力,而且可以克服对聚类初始中心点的依赖性。采用标准的UCI数据集进行实验对比,发现改进后的算法相比传统的算法有较高的准确率和稳定性。

关键词: K均值, 聚类中心, 样本密度, 全局优化

Abstract: Aiming at the problem of traditional K-means algorithm which is sensitive to initial clustering center and easy to fall into local optimum, this paper proposes a kind of global optimized K-means clustering algorithm based on sample density. In the iterative process, the KMS-GOSD algorithm obtains the pre-estimation density of the clustering center by Gaussian model, then the clustering center whose actual density is lower than the pre-estimated density at most will be dithered. The KMS-GOSD algorithm can not only overcome the dependence on the initial center of clustering, but also enhance the global exploration ability. It uses the standard UCI data sets as the contrast experiment objects, and finds that the improved algorithm has higher accuracy and stability compared with the traditional algorithm.

Key words: K-means, clustering center, sample density, global optimization