计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (23): 73-85.DOI: 10.3778/j.issn.1002-8331.2212-0234

• 理论与研发 • 上一篇    下一篇

自适应多密度峰值子簇融合聚类算法

陈迪,杜韬,周劲,仵匀政,王心耕   

  1. 1.济南大学 信息科学与工程学院,济南 250024
    2.山东省网络环境智能计算技术重点实验室,济南 250024
  • 出版日期:2023-12-01 发布日期:2023-12-01

Adaptive Multi-Density Peak Sub-Cluster Fusion Clustering Algorithm

CHEN Di, DU Tao, ZHOU Jin, WU Yunzheng, WANG Xingeng   

  1. 1.College of Information Science and Engineering,University of Jinan, Jinan 250024, China
    2.Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan 250024, China
  • Online:2023-12-01 Published:2023-12-01

摘要: 经典的密度峰值聚类算法在计算局部密度时过分依赖截断距离,在分配非中心点时易出现连锁效应,且人工选取聚类中心点的方式难以识别出密度不均匀簇的聚类中心。针对该问题,提出一种自适应多密度峰值子簇融合聚类算法。考虑样本的邻域信息,将自然邻居的思想引入密度峰值聚类中,实现了样本点局部密度的自适应计算;为发现稀疏密度簇的中心,提出一种簇中心自动选取策略用于确定初始子簇中心,并使用一种两阶段分配策略降低连锁效应发生的概率;提出一种基于K近邻相似度的度量准则,将相似度高的子簇进行融合,得到最终的聚类结果。在二维合成数据集以及UCI数据集上,相较经典的密度峰值聚类算法以及近年来对其改进的算法,该算法表现出更优异的性能。

关键词: 自然邻居, 密度峰值聚类, 多子簇融合, 分配策略

Abstract: The classical density peak clustering algorithm relies too much on the cutoff distance when calculating the local density, it is prone to chain effects when assigning non-central points, and it is difficult to identify the cluster centers with uneven density by manually selecting the cluster center points. To solve the above problems, an adaptive multi-density peak sub-cluster fusion clustering algorithm is proposed. Considering the neighborhood information of the samples, the idea of natural neighbors is introduced into the density peak clustering to realize the self-adaptive calculation of the local density of the sample points. In order to find the centers of the sparse clusters, an automatic cluster center selection strategy is proposed to determine the initial sub-cluster center, and a two-step allocation strategy is used to reduce the probability of the occurrence of chain effects. A K-nearest neighbor similarity measurement criterion is proposed, and the subclusters with high similarity are fused to obtain the final clustering result. Compared with the classical density peak clustering algorithm and its improved algorithms in recent years, the algorithm shows better performance on two-dimensional synthetic datasets and UCI datasets.

Key words: natural neighbors, density peak clustering, multi-sub cluster fusion, assignment strategy