计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (13): 47-53.DOI: 10.3778/j.issn.1002-8331.1905-0171

• 理论与研发 • 上一篇    下一篇

物理学优化的密度峰值聚类算法

贾露,张德生,吕端端   

  1. 西安理工大学 理学院,西安 710054
  • 出版日期:2020-07-01 发布日期:2020-07-02

Optimized Density Peak Clustering Algorithm in Physics

JIA Lu, ZHANG Desheng, LV Duanduan   

  1. School of Science, Xi’an University of Technology, Xi’an 710054, China
  • Online:2020-07-01 Published:2020-07-02

摘要:

针对密度峰值聚类算法(DPC)在计算样本的局部密度时随机选取截断距离、分配剩余样本点错误率高等问题,提出了一种物理学改进的密度峰值聚类算法W-DPC。通过万有引力定律定义样本的局部密度;基于第一宇宙速度建立了两步策略对剩余样本点进行分配,即必须属于点的分配和可能属于点的分配,使剩余样本点的分配更加精确。利用人工合成数据集与UCI上的真实数据集对W-DPC算法进行测试,并与KNN-DPC算法、DPC算法、DBSCAN算法、AP算法以及K-Means算法进行比较,数值实验表明:W-DPC算法的聚类效果明显优于其他算法。

关键词: 密度峰值聚类算法, 聚类分析, 引力定律, 局部密度, 第一宇宙速度

Abstract:

Aiming at the problem that the Density Peak Clustering(DPC) algorithm randomly selects the truncation distance and assigns the residual error rate of the residual sample points when calculating the local density of the sample, a physics improved density peak clustering algorithmW-DPC is proposed. Firstly, the local density of the sample is defined by the law of universal gravitation; then, a two-step strategy is established based on the first cosmic velocity to allocate the remaining sample points:that is, it must belong to the distribution of points and may belong to the distribution of points, so that the distribution of remaining sample points is more accurately. Finally, the W-DPC algorithm is tested with the artificial dataset and the real dataset on UCI, and compared with KNN-DPC algorithm, DPC algorithm, DBSCAN algorithm, AP algorithm and K-Means algorithm. The results show that the clustering effect of the W-DPC algorithm is significantly better than other algorithms.

Key words: density peak clustering algorithm, cluster analysis, gravitational theory, local density, first cosmic speed