Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (24): 78-87.DOI: 10.3778/j.issn.1002-8331.2302-0037

• Theory, Research and Development • Previous Articles     Next Articles

k-NN Density Dominator Component Delegations Based Density Peaks Clustering

LYU Hongzhang, YANG Yiyang, YANG Geping, GONG Zhiguo   

  1. 1.School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
    2.Department of Computer and Information Science, University of Macao, Macao 999078, China
  • Online:2023-12-15 Published:2023-12-15

k近邻密度支配域代表团密度峰值聚类算法

吕鸿章,杨易扬,杨戈平,巩志国   

  1. 1.广东工业大学 计算机学院,广州 510006
    2.澳门大学 计算机与信息科学系,澳门 999078

Abstract: DPC(clustering by fast search and find of density peaks) is inefficient in processing large-scale clustering. [k](lower case)-NN density dominator component skill can improve such shortcoming. However, representative data points in such skill could have poor ability on representation, which leads to lower clustering quality. The delegation sampling strategy can be used as an improvement on the above issue. The resulting new algorithm not only inherits the efficient characteristics of density dominator component acceleration skill, but also ensures the quality of clustering. This algorithm first constructs [k]-nearest neighbor graph. Then, kernel density is estimated and density dominator component is built. Thirdly, each density dominator component is sampled from its high and low density area, and similarity is computed between each dominator via delegations’ nearest neighbor relationship. Finally, DPC algorithm is conducted with each domain as the data point. The experiments show that the introduction of delegations strategy can improve the performance of the original DPC, and the clustering results are better than some other density clustering algorithms.

Key words: density peak clustering, [k]-nearest neighbor graph, density dominator component, delegations strategy, large-scale clustering

摘要: 密度峰值聚类(clustering by fast search and find of density peaks,DPC)算法在应对大规模聚类时效率不高。[k]近邻密度支配域小团簇加速技巧可以很好地改善该短板,但存在代表点代表能力不足的问题,从而影响聚类质量。代表团采样策略可作为上述问题的改进方式。由此形成的新算法不仅继承了原有密度支配域小团簇加速技巧的高效特性,还保证了聚类的质量。算法构建[k]近邻图。再利用[k]近邻图进行核密度估计并构建若干个密度支配域。对各密度支配域分别从高低密度区域采样支配域代表团。利用代表团的近邻关系计算域间相似度。将各支配域视为新样本点,执行DPC算法完成聚类。实验证明,引入代表团策略对DPC算法有一定的提升,聚类效果比部分密度聚类算法更好。

关键词: 密度峰值聚类, [k]近邻图, 密度支配域, 代表团策略, 大规模聚类