计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (2): 148-153.DOI: 10.3778/j.issn.1002-8331.1710-0059

• 模式识别与人工智能 • 上一篇    下一篇

[K]近邻相似度优化的密度峰聚类

朱庆峰1,2,葛洪伟1,2   

  1. 1.轻工过程先进控制教育部重点实验室(江南大学),江苏 无锡 214122
    2.江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2019-01-15 发布日期:2019-01-15

Density Peaks Clustering Optimized by [K] Nearest Neighbor’s Similarity

ZHU Qingfeng1,2, GE Hongwei1,2   

  1. 1.Ministry of Education Key Laboratory of Advanced Process Control for Light Industry(Jiangnan University), Wuxi, Jiangsu 214122, China
    2.School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2019-01-15 Published:2019-01-15

摘要: 针对密度峰聚类分配时,仅考虑样本点与指向点(密度比它大的最近点)之间的距离,不适用于流形聚类(如Circleblock数据集、Lineblobs数据集等)的问题,提出了[K]近邻相似度优化的密度峰聚类算法。在计算每个点的密度与指向点后,通过相似度函数,找出每个点的[K]近邻,然后根据[K]近邻信息判断样本点的指向点是否正确,对于指向错误的点重新寻找正确的指向点,可以有效减少错误分配。在人工数据集和UCI数据集上的实验表明,新算法具有更高的准确率。

关键词: 聚类, 密度峰, 相似度, [K]近邻

Abstract: For the clustering of density peaks, only the distance between the sample point and the point of pointing (the nearest point of density is bigger than it) is considered, and it is not applicable to the problem of manifold clustering (such as Circleblock data set, Lineblobs data set, etc.). A density peak clustering algorithm with [K] similarity optimization is proposed. After calculating the density and point of each point, find the [K] neighborhood of each point by the similarity function, and then judge whether the point of the sample point is correct according to the [K] proximity information. For the point pointing to the wrong point, it can effectively reduce the error distribution. Experiments on artificial datasets and UCI datasets show that the new algorithm has a higher accuracy rate.

Key words: clustering, density peaks, similarity, [K] nearest neighbor