Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (21): 75-82.DOI: 10.3778/j.issn.1002-8331.2105-0444

• Theory, Research and Development • Previous Articles     Next Articles

Density Peak Clustering Algorithm Combining Density-Ratio and System Evolution

CAO Junrong, ZHANG Desheng, XIAO Yanting   

  1. College of Science, Xi’an University of Technology, Xi’an 710054, China
  • Online:2022-11-01 Published:2022-11-01

结合密度比和系统演化的密度峰值聚类算法

曹俊茸,张德生,肖燕婷   

  1. 西安理工大学 理学院,西安 710054

Abstract: The density peak clustering(DPC) algorithm can effectively cluster non-spherical data. However, the algorithm needs to input the cutoff distance and manually intercept the clustering center, which causes the poor clustering effect of the DPC algorithm sometimes. To solve these problems, this paper proposes a density peak clustering algorithm combining density-ratio and system evolution(DS-DPC). The natural nearest neighbor search is used to obtain the number of neighbors of each sample point, and the density calculation formula is improved according to the idea of density-ratio, so that it can reflect the distribution of surrounding samples. According to the ranking value, the product of the local density and the relative distance is sorted in descending order, the cluster centers, and clusters the remaining samples are selected, according to the allocation strategy of the DPC algorithm, avoiding the subjectivity of manually selecting the cluster centers. The system evolution method is used to determine whether the clustering results need to be merged or separated. Through experiments on multiple data sets and comparison with other clustering algorithms, the experimental results show that the proposed algorithm has a better clustering effect.

Key words: density peak clustering algorithm, natural nearest neighbor, density-ratio, system evolution method, clustering

摘要: 密度峰值聚类算法(DPC)能够有效地进行非球形数据的聚类,该算法需要输入截断距离,人工截取聚类中心,导致DPC算法的聚类效果有时较差。针对这些问题,提出一种结合密度比和系统演化的密度峰值聚类算法(DS-DPC)。利用自然最近邻搜索得出各样本点的邻居数目,根据密度比思想改进密度计算公式,使其能够反映周围样本的分布情况;对局部密度与相对距离的乘积进行降序排列,根据排序值选出聚类中心,将剩余样本按照DPC算法的分配策略进行聚类,避免了手动选择聚类中心的主观性;利用系统演化方法判断聚类结果是否需要合并或分离。通过在多个数据集上进行实验,并与其他聚类算法进行比较,实验结果表明,该算法具有较好的聚类效果。

关键词: 密度峰值聚类算法, 自然最近邻, 密度比, 系统演化方法, 聚类