Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (12): 138-145.DOI: 10.3778/j.issn.1002-8331.1710-0034

Previous Articles     Next Articles

Density peaks clustering method based on density dichotomy

XU Chaoyang1, LIN Yaohai2, ZHANG Ping1   

  1. 1.School of Information Engineering, Putian University, Putian, Fujian 351100, China
    2.College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
  • Online:2018-06-15 Published:2018-07-03

基于密度二分法的密度峰值聚类方法

许朝阳1,林耀海2,张  萍1   

  1. 1.莆田学院 信息工程学院,福建 莆田 351100
    2.福建农林大学 计算机与信息学院,福州 350002

Abstract: Density Peaks Clustering(DPC) is a famous cluster algorithm for various data, regardless of their shapes or features. It has been widely studied and applied to solve problems in many fields in recent years. However, its clustering effect is reduced when the densities of the cluster centers differ greatly, or there are many peaks of density in a certain cluster. To address it, a density peaks clustering method based on density dichotomies is proposed. Firstly, the global average density of each point is obtained and the data are divided into two groups according to high density and low density. Secondly, it identifies the clustering centers according to the decision diagram of high density points and then merges the clustering centers if it is within reachable distance. Finally, the high density points and the low density points are assigned to the appropriate clustering centers according to the strategy proposed in this paper. Experiments on several synthetic and real datasets show that the clustering results of the proposed algorithm are better than those of existing DPC algorithms.

Key words: Density Peaks Clustering(DPC), density dichotomy, decision diagram, high density points

摘要: 密度峰值聚类(DPC)方法能够快速地对数据进行聚类,而不管它们的形状和包含它们的空间的维数,近年来得到广泛研究和应用。然而,当各个聚类中心的密度的差异较大,或者同一个类中包含多个密度中心时,DPC计算效果受到影响。针对于此,提出了基于密度二分法的密度峰值聚类方法。首先,求出全部数据平均密度,将数据分为高密度点和低密度点,然后,根据高密度的点的决策图识别出聚类中心后,根据是否存在可达距离的数据点对同类的聚类中心实现合并。最后,根据提出的分配策略,使高密度点和低密度点都分配到合适的聚类中心,从而实现聚类。在多个合成及实际数据集上的实验表明,该方法的聚类效果明显优于已有的DPC方法。

关键词: 密度峰值聚类, 密度二分法, 决策图, 高密度点