计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (20): 90-96.DOI: 10.3778/j.issn.1002-8331.2009-0284

• 理论与研发 • 上一篇    下一篇

Ball-Tree优化的密度峰值聚类算法

丁松阳,田青云   

  1. 河南财经政法大学 计算机与信息工程学院,郑州 450046
  • 出版日期:2021-10-15 发布日期:2021-10-21

Density Peak Clustering Algorithm Based on Ball-Tree

DING Songyang, TIAN Qingyun   

  1. School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450046, China
  • Online:2021-10-15 Published:2021-10-21

摘要:

针对密度峰值聚类算法DPC(clustering by fast search and find of density peaks)时间复杂度高、准确度低的缺陷,提出了一种基于Ball-Tree优化的快速密度峰值聚类算法BT-DPC。算法利用第[k]近邻度量样本局部密度,通过构建Ball-Tree加速密度[ρ]及距离[δ]的计算;在类簇分配阶段,结合[k]近邻思想设计统计学习分配策略,将边界点正确归类。通过在UCI数据集上的实验,将该算法与原密度峰值聚类算法及其改进算法进行了对比,实验结果表明,BT-DPC算法在降低时间复杂度的同时提高了聚类的准确度。

关键词: 聚类算法, ball-tree, 密度峰值聚类, 分配策略

Abstract:

In order to overcome the deficiencies of clustering by fast search and find of density peaks(DPC) for its high time complexity and low accuracy, an optimized fast density peak clustering algorithm is proposed based on Ball-Tree in this paper(BT-DPC). The algorithm defines local density of a point based on [k]-nearest neighbor, and constructs a ball tree to accelerate the calculation of the local density [ρ] and the distance [δ]. In the cluster allocation stage, the statistical learning allocation strategy is designed based on the [k]-nearest neighbors idea to classify the boundary points correctly. The experimental result shows that the BT-DPC algorithm can improve the time performance on the basis of increasing clustering quality compared with DPC algorithm and other popular clustering algorithms through the theory analysis and the experiments on several real-world datasets from the UCI machine learning repository.

Key words: clustering algorithm, ball-tree, clustering by fast search and find of density peaks(DPC), allocation strategy