计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (8): 137-142.DOI: 10.3778/j.issn.1002-8331.1611-0483

• 模式识别与人工智能 • 上一篇    下一篇

自动确定聚类中心的密度峰值算法

王  洋1,张桂珠2   

  1. 江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2018-04-15 发布日期:2018-05-02

Automatically determine density of cluster center of peak algorithm

WANG Yang1, ZHANG Guizhu2   

  1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2018-04-15 Published:2018-05-02

摘要: 密度峰值聚类算法(Density Peaks Clustering,DPC),是一种基于密度的聚类算法,该算法具有不需要指定聚类参数,能够发现非球状簇等优点。针对密度峰值算法凭借经验计算截断距离[dc]无法有效应对各个场景并且密度峰值算法人工选取聚类中心的方式难以准确获取实际聚类中心的缺陷,提出了一种基于基尼指数的自适应截断距离和自动获取聚类中心的方法,可以有效解决传统的DPC算法无法处理复杂数据集的缺点。该算法首先通过基尼指数自适应截断距离[dc],然后计算各点的簇中心权值,再用斜率的变化找出临界点,这一策略有效避免了通过决策图人工选取聚类中心所带来的误差。实验表明,新算法不仅能够自动确定聚类中心,而且比原算法准确率更高。

关键词: 密度峰值, 聚类, 簇中心点, 基尼指数

Abstract: Density Peaks Clustering(DPC) is a density-based clustering algorithm, which has the advantage of not needing to specify clustering parameters and discovering non-spherical clusters. In this paper, an adaptive truncation method based on Gini index is proposed to solve the problem that the density peak algorithm can not effectively deal with each scene by calculating the cutoff distance [dc], and the density peak algorithm manually selects the clustering center to get the actual clustering center. Distance [dc] and automatic clustering center method can effectively solve the defects of traditional DPC algorithm which can not handle the complex data set. The algorithm firstly cuts off the distance through Gini index, then calculates the cluster center weights of each point, and then uses the change of slope to find the critical point. This strategy effectively avoids the errors caused by manual selection of clustering centers by decision graph. Experiments show that the new algorithm not only can automatically determine the clustering center, but also has higher accuracy than the original algorithm.

Key words: density peak, clustering, cluster center point, Gini index