计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (20): 43-51.DOI: 10.3778/j.issn.1002-8331.1903-0246

• 理论与研发 • 上一篇    下一篇

基于共享[k]-近邻与共享逆近邻的密度峰聚类

高月,杨小飞,马盈仓,汪义瑞   

  1. 1.西安工程大学 理学院,西安 710600
    2.安康学院 数学与统计学院,陕西 安康 725000
  • 出版日期:2019-10-15 发布日期:2019-10-14

Density Peak Clustering Based on Shared [k]-Nearest Neighbors and Shared Reverse Nearest Neighbors

GAO Yue, YANG Xiaofei, MA Yingcang, WANG Yirui   

  1. 1.School of Science, Xi’an Polytechnic University, Xi’an 710600, China
    2.School of Mathematics and Statistics, Ankang University, Ankang, Shaanxi 725000, China
  • Online:2019-10-15 Published:2019-10-14

摘要: 为了更好地解决密度不均衡问题与刻画高维数据相似性度量问题,提出一种基于共享[k]-近邻与共享逆近邻的密度峰聚类算法。该算法计算两个点的共享[k]-近邻数与共享逆近邻数,并结合欧氏距离来确定这两个点之间的共享相似度;将样本点与其逆近邻点的共享相似度之和定义为该点的共享密度,再通过共享密度选取聚类中心。通过实验证明,该算法在人工数据集和真实数据集上的聚类结果较其他密度聚类算法更加准确,并且能更好地处理密度不均衡问题,同时也提高了高维数据的聚类精度。

关键词: 密度峰聚类, 共享[k]-近邻与共享逆近邻, 共享相似度, 共享密度

Abstract: In order to better solve the problem of density imbalance and characterize the similarity measure of high-dimensional data, a density peak clustering algorithm based on shared [k]-nearest neighbors and shared reverse nearest neighbors is proposed. This algorithm first calculates the shared [k]-nearest neighbor number and the shared reverse nearest neighbor number of two points, and combines them with the Euclidean distance to determine the shared similarity between the two points. In the following it defines shared density of a point by sum of shared similarities between this point and its reverse nearest neighbors, and then selects the cluster center by the shared density. The experimental results show that the clustering results of the algorithm on the artificial dataset and the real dataset are more accurate than other density clustering algorithms. So the algorithm can better deal with the density imbalance problem, and also improves the clustering accuracy of high-dimensional data.

Key words: density peak clustering, shared [k]-nearest neighbors and shared reverse nearest neighbors, shared similarity, shared density