计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (1): 165-173.DOI: 10.3778/j.issn.1002-8331.2403-0075

• 理论与研发 • 上一篇    下一篇

基于反向最近邻的密度估计聚类算法

许梅梅,侯新民   

  1. 1.中国科学技术大学 大数据学院,合肥 230026
    2.中国科学技术大学 数学科学学院,合肥 230026
    3.中国科学技术大学 中国科学院吴文俊数学重点实验室,合肥 230026
    4.合肥国家实验室,合肥 230088
  • 出版日期:2025-01-01 发布日期:2024-12-31

Density Estimation Clustering Method Based on Reverse Nearest Neighbor

XU Meimei, HOU Xinmin   

  1. 1.School of Data Science, University of Science and Technology of China, Hefei 230026, China
    2.School of Mathematical Sciences, University of Science and Technology of China, Hefei 230026, China
    3.CAS Key Laboratory of Wu Wen-Tsun Mathematics, University of Science and Technology of China, Hefei 230026, China
    4.Hefei National Laboratory, Hefei 230088, China
  • Online:2025-01-01 Published:2024-12-31

摘要: 基于相互最近邻的密度峰聚类算法(DenMune)通过相互最近邻计算数据点的局部密度,是一种有效的聚类手段。但该算法存在构建聚类骨架不合理的问题,在分配弱点时采用硬投票策略,易产生错误。因此提出一种新的基于反向最近邻的密度估计聚类算法(RNN-DEC)。该算法引入反向最近邻来计算数据点的局部密度,将数据点分成强点、弱点和噪声点。使用强点构建聚类算法的骨架,通过软投票的方式将弱点分配到与其相似度最高的簇中去。提出了一种基于反向最近邻的簇融合算法,将相似度高的子簇融合,得到最终的聚类结果。实验结果表明,在一些合成数据集和UCI真实数据集上,相比较于其他经典算法,该算法具有更好的聚类效果。

关键词: 反向最近邻, 局部密度, 密度聚类算法, 子簇融合

Abstract: The density peak clustering algorithm based on mutual nearest neighbors (DenMune) is an effective clustering method that calculates the local density of data points through mutual nearest neighbors. However, this algorithm has the problem of unreasonable construction of clustering skeletons, and using a hard voting strategy when allocating weaknesses can easily lead to errors. Therefore, a new density estimation clustering algorithm based on reverse nearest neighbor (RNN-DEC) is proposed. This algorithm introduces reverse nearest neighbors to calculate the local density of data points, dividing them into strong points, weak points, and noisy points. It builds the skeleton of the clustering algorithm using strong points and assigns weaknesses to the cluster with the highest similarity through soft voting. Finally, a cluster fusion algorithm based on reverse nearest neighbor is proposed, which fuses high similarity sub clusters to obtain the final clustering result. The experimental results show that compared to other classical algorithms, this algorithm has better clustering performance on some synthetic datasets and UCI real datasets.

Key words: reverse nearest neighbor, local density, density clustering algorithm, subcluster fusion