计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (21): 91-101.DOI: 10.3778/j.issn.1002-8331.2207-0446

• 理论与研发 • 上一篇    下一篇

自适应聚类中心策略优化的密度峰值聚类算法

徐童童,解滨,张喜梅,张春昊   

  1. 1.河北师范大学 计算机与网络空间安全学院,石家庄 050024
    2.河北师范大学 河北省网络与信息安全重点实验室,石家庄 050024
    3.河北师范大学 供应链大数据分析与数据安全河北省工程研究中心,石家庄 050024
  • 出版日期:2023-11-01 发布日期:2023-11-01

Density Peak Clustering Algorithm Optimized by Adaptive Clustering Centers Strategy

XU Tongtong, XIE Bin, ZHANG Ximei, ZHANG Chunhao   

  1. 1.College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China
    2.Hebei Provincial Key Laboratory of Network and Information Security, Hebei Normal University, Shijiazhuang 050024, China
    3.Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics and Data Security, Hebei Normal University, Shijiazhuang 050024, China
  • Online:2023-11-01 Published:2023-11-01

摘要: 密度峰值聚类算法(DPC)是一种简单高效的无监督聚类算法,能够快速找到聚类中心完成聚类。该算法通过截断距离定义局部密度未考虑样本点的空间分布特征;通过决策图选择聚类中心点,具有较强人为主观性;在分配样本点时采用单一分配策略,易产生连带错误。因此提出一种自适应聚类中心策略优化的密度峰值聚类算法(ADPC),采用共享近邻定义两点之间的相似性度量,重新定义了局部密度,使局部密度反应样本间的空间分布特征;通过相邻点之间斜率差分确定样本密度[ρ]与相对距离[δ]的乘积[γ]值的“拐点”,并对[γ]进行幂函数变换,以提高潜在聚类中心与非聚类中心的区分度,利用决策函数确定潜在的聚类中心,再通过潜在聚类中心之间距离均值自适应确定真实聚类中心;优化了非聚类中心点的分配策略。通过在UCI以及人工数据集上进行实验,该算法都可以自适应准确选定聚类中心,且在一定程度上提高了聚类性能。

关键词: 密度峰值聚类, 共享近邻, 斜率差分, 自适应, 决策函数

Abstract: Density peak clustering(DPC) algorithm is a simple and efficient unsupervised clustering algorithm, which can quickly find the clustering centers to complete clustering. However, the local density is defined by truncation distance without considering the spatial distribution characteristics of sample points. Selecting clustering center points by decision graph has strong artificial subjectivity. When using single allocation strategy, it is easy to cause joint error. Therefore, a density peak clustering algorithm optimized by shared nearest neighbors and adaptive clustering centers strategy(ADPC) is proposed. The shared nearest neighbors are used to define the similarity measure between two points, and the local density is redefined so that it reflects the spatial distribution characteristics of samples. The [γ] value is the product of the sample density[ρ] and relative distance [δ]. The “inflection point” is determined by slope difference between adjacent points. And the [γ] power transformation improves the degree of differentiation between the potential clustering centers and the non-clustering centers. Decision function is used to determine the potential clustering centers. Then, the mean of distance between the potential clustering centers adaptive to determine the real clustering centers. The allocation strategy of non-clustering center points is optimized. Through experiments on UCI and synthetic datasets, the algorithm can select the clustering centers adaptively and improve the clustering performance to some extent.

Key words: density peak clustering, shared neighbors, slope difference, adaptive, decision function