Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (12): 84-93.DOI: 10.3778/j.issn.1002-8331.2203-0161

• Theory, Research and Development • Previous Articles     Next Articles

Density Peaking Clustering Algorithm Combining Hybrid Density and Local Structure

MA Zhenming, AN Junxiu, ZHOU Jun   

  1. College of Software Engineering, Chengdu University of Information Technology, Chengdu 610200, China
  • Online:2023-06-15 Published:2023-06-15

结合混合密度和局部结构的密度峰值聚类算法

马振明,安俊秀,周俊   

  1. 成都信息工程大学 软件工程学院,成都 610200

Abstract: Density peak clustering(DPC) is a simple and effective clustering algorithm, but the gap between assumption and implementation, and the assumption inapplicability remains to be its defects. As a result, DPC has difficulties in determining centroids on data sets with uneven densities, acquires low robustness in non-centroid assignment strategies, and leads to chain reactions. To address this, a density peak clustering algorithm(HS-DPC) combining hybrid density and local structure is proposed in this paper. Firstly, DPC assumes that the centroids are local peaks, but the algorithm is realized as global peaks. So, using relative density and absolute density, a hybrid density calculation formula is proposed to eliminate the inconsistency. Secondly, the similarity between data points is redefined according to the local structure, thus adapting to the data with complex shape. Finally, according to the similarity transfer of the center point, the valid data is searched to build the cluster backbone structure. Referring to the distribution of backbone points of different clusters, the remaining points are assigned according to optimal distance. Thus the boundary clustering is completed without chain reaction. After the experiment on 16 data sets compared to other five clustering algorithms, the results show the effectiveness and robustness of HS-DPC.

Key words: density peak clustering, hybrid density, local structure, similarity transfer, label propagation

摘要: 密度峰值聚类(DPC)是简单有效的聚类算法,但该算法存在假设与实现间不一致和假设不适用的问题,导致DPC在密度不均的数据集上很难确定中心点,且非中心点分配策略鲁棒性低并产生连锁反应。针对此,提出一种结合混合密度和局部结构的密度峰值聚类算法(HS-DPC)。利用相对密度和绝对密度,提出混合密度计算公式,消除DPC假设中心点为局部峰值,但算法实现是全局峰值间的不一致。根据局部结构重新定义数据点之间的相似性,从而适应形状复杂数据。对中心点依据相似性传递,搜索有效数据并形成簇的主干结构。对剩余点结合不同簇的主干点分布进行距离最优分配,隔绝连锁反应完成边界聚类。通过在16个数据集上与五种聚类算法进行对比实验,结果表明了HS-DPC的有效性和鲁棒性。

关键词: 密度峰值聚类, 混合密度, 局部结构, 相似性传递, 标签传播