计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (24): 122-127.DOI: 10.3778/j.issn.1002-8331.1901-0086

• 模式识别与人工智能 • 上一篇    下一篇

自适应快速搜索密度峰值聚类算法

王军华,李建军,李俊山,赖文达   

  1. 广东外语外贸大学 南国商学院 智能信息研究所,广州 510545
  • 出版日期:2019-12-15 发布日期:2019-12-11

Adaptive Fast Search Density Peak Clustering Algorithm

WANG Junhua, LI Jianjun, LI Junshan, LAI Wenda   

  1. Institute of Intelligent Information, South China Business College, Guangdong University of Foreign Studies, Guangzhou 510545, China
  • Online:2019-12-15 Published:2019-12-11

摘要: CFSFDP算法(Clustering by Fast Search and Find of Density Peaks)具有简单高效且需要较少参数的优点,但存在需要人为确定截断距离参数和聚类中心的不足。为克服以上不足,提出了自适应快速搜索密度峰值聚类算法。该算法针对截断距离参数的确定问题,构造关于截断距离参数的局部密度信息熵,通过最小化信息熵自适应地确定截断距离参数;针对聚类中心的确定问题,利用从非聚类中心到聚类中心数据点局部密度和距离的乘积,存在明显跳跃这一特征确定阈值,从而能自动确定聚类中心。实验结果表明该算法能够取得较好的聚类性能,且无需人为确定截断距离参数和聚类中心。

关键词: 聚类, 密度峰值, 截断距离参数, 局部密度

Abstract: A density-based clustering algorithm, called Clustering by Fast Search and Find of Density Peaks(CFSFDP), is fast and need few parameters, but it exists two drawbacks that cut-off distance parameter and clustering centers must be given subjectively. To overcome these shortcomings, adaptive fast search density peak clustering algorithms is proposed. For estimating the cut-off distance parameter, the local density entropy is established and then cut-off distance parameter is adaptively estimated by minimizing the entropy. For obtaining clustering centers, the product of local density and distance, which exists the jumping character form data point to clustering centers, is used to ensure threshold, then cluster centers are gotten. The experimental results show that the proposed algorithm can achieve better clustering performance without artificial determination of cut-off distance parameters and clustering centers.

Key words: clustering, density peak, cut-off distance parameter, local density