Density Peak Clustering Algorithm Optimized by Adaptive Clustering Centers Strategy

doi:10.3778/j.issn.1002-8331.2207-0446

Abstract

Abstract: Density peak clustering（DPC） algorithm is a simple and efficient unsupervised clustering algorithm, which can quickly find the clustering centers to complete clustering. However, the local density is defined by truncation distance without considering the spatial distribution characteristics of sample points. Selecting clustering center points by decision graph has strong artificial subjectivity. When using single allocation strategy, it is easy to cause joint error. Therefore, a density peak clustering algorithm optimized by shared nearest neighbors and adaptive clustering centers strategy（ADPC） is proposed. The shared nearest neighbors are used to define the similarity measure between two points, and the local density is redefined so that it reflects the spatial distribution characteristics of samples. The [γ] value is the product of the sample density[ρ] and relative distance [δ]. The “inflection point” is determined by slope difference between adjacent points. And the [γ] power transformation improves the degree of differentiation between the potential clustering centers and the non-clustering centers. Decision function is used to determine the potential clustering centers. Then, the mean of distance between the potential clustering centers adaptive to determine the real clustering centers. The allocation strategy of non-clustering center points is optimized. Through experiments on UCI and synthetic datasets, the algorithm can select the clustering centers adaptively and improve the clustering performance to some extent.

Key words: density peak clustering, shared neighbors, slope difference, adaptive, decision function

摘要： 密度峰值聚类算法（DPC）是一种简单高效的无监督聚类算法，能够快速找到聚类中心完成聚类。该算法通过截断距离定义局部密度未考虑样本点的空间分布特征；通过决策图选择聚类中心点，具有较强人为主观性；在分配样本点时采用单一分配策略，易产生连带错误。因此提出一种自适应聚类中心策略优化的密度峰值聚类算法（ADPC），采用共享近邻定义两点之间的相似性度量，重新定义了局部密度，使局部密度反应样本间的空间分布特征；通过相邻点之间斜率差分确定样本密度[ρ]与相对距离[δ]的乘积[γ]值的“拐点”，并对[γ]进行幂函数变换，以提高潜在聚类中心与非聚类中心的区分度，利用决策函数确定潜在的聚类中心，再通过潜在聚类中心之间距离均值自适应确定真实聚类中心；优化了非聚类中心点的分配策略。通过在UCI以及人工数据集上进行实验，该算法都可以自适应准确选定聚类中心，且在一定程度上提高了聚类性能。

关键词: 密度峰值聚类, 共享近邻, 斜率差分, 自适应, 决策函数

XU Tongtong, XIE Bin, ZHANG Ximei, ZHANG Chunhao. Density Peak Clustering Algorithm Optimized by Adaptive Clustering Centers Strategy[J]. Computer Engineering and Applications, 2023, 59(21): 91-101.

徐童童, 解滨, 张喜梅, 张春昊. 自适应聚类中心策略优化的密度峰值聚类算法[J]. 计算机工程与应用, 2023, 59(21): 91-101.

References

[1] ROSENBERGER C，CHEHDI K.Unsupervised clustering method with optimal estimation of the number of clusters：application to image segmentation[C]//Proceedings of the 15th International Conference on Pattern Recognition.Piscataway，NJ：IEEE，2000：656-659.
[2] LLOBELL F，VIGNEAU E，QANNARI E M.Clustering datasets by means of CLUSTATIS with identification of atypical datasets.Application to sensometrics[J].Food Quality and Preference，2019，75：97-104.
[3] CHAKRABARTI S.Data mining for hypertext：a tutorial survey[J].ACM SIGKDD Explorations Newsletter，2000，1（2）：1-11.
[4] AGGARWAL C C，REDDY C K.Data clustering：algorithms and applications[M].Boca Raton，USA：CRC Press，2013.
[5] POTHULA K R，SMYRNOVA D，SCHRODER G F.Clustering cryo-EM images of helical protein polymers for helical reconstructions[J].Ultramicroscopy，2018，203：132-138.
[6] HAN J W，KAMBER M.数据挖掘：概念与技术[M].范明，孟小峰，译.2版.北京：机械工业出版社，2007：263-266.
HAN J W，KAMBER M.Data mining：concepts and techniques[M].FAN M，MENG X F.2nd ed.Beijing：China Machine Press，2007：263-266.
[7] JAIN A K.Data clustering：50 years beyond K-means[J].Pattern Recognition Letters，2010，31（8）：651-666.
[8] GUYON I，LUXBURG U V，WILLIAMSON R C.Clustering：science or art[C]//NIPS Workshop on Clustering Theory，2012.
[9] YANG P，ZHU Q S，HUANG B.Spectral clustering with density sensitive similarity function[J].Knowledge-Based Systems，2011，24（5）：621-628.
[10] KANUNGO T，MOUNT D M，NETANYAHU N S，et al.An efficient k-means clustering algorithm：analysis and implementation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2002，24（7）：881-892.
[11] YANG J，GAO J W，LIANG J Y，et al.An improved DBSCAN clustering algorithm based on data field[J].Journal of Frontiers of Computer Science and Technology，2012，6（10）：903-911.
[12] FREY B J，DUCEK D.Clustering by passing messages between data points[J].Science，2007，315：972-976.
[13] RODRIGUEZ A，LAIO A.Clustering by fast search and find of density peaks[J].Science，2014，344（6191）：1492-1496.
[14] XIE J Y，GAO H C，XIE W X，et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J].Information Sciences，2016，354：19-40.
[15] DU M J，DING S F，JIA H J.Study on density peaks clustering based on K-nearest neighbors and principal component analysis[J].Knowledge-Based Systems，2016，99：135-145.
[16] BAI L，CHENG X Q，LIANG J Y，et al.Fast density clustering strategies based on the K-means algorithm[J]，Pattern Recognition，2017，71：375-386.
[17] 丁世飞，徐晓，王艳茹.基于不相似性度量优化的密度峰值聚类算法[J].软件学报，2020，31（11）：3321-3333.
DING S F，XU X，WANG Y R.Optimized density peaks clustering algorithm based on dissimilarity measure[J].Journal of Software，2020，31（11）：3321-3333.
[18] 谢娟英，高红超，谢维信.K近邻优化的密度峰值快速搜索聚类算法[J].中国科学：信息科学，2016，46（2）：258-280.
XIE J Y，GAO H C，XIE W X.K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset[J].Scientia Sinica Informationis，2016，46（2）：258-280.
[19] LIU Y H，MA Z M，YU F.Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy[J].Knowledge-Based Systems，2017，133：208-220.
[20] 李涛，葛洪伟，苏树智.自动确定聚类中心的密度峰聚类[J].计算机科学与探索，2016，10（11）：1614-1622.
LI T，GE H W，SU S Z.Density peaks clustering by automatic determination of cluster centers[J].Journal of Frontiers of Computer Science and Technology，2016，10（11）：1614-1622.
[21] 王万良，吴菲，吕闯.自动确定聚类中心的快速搜索和发现密度峰值的聚类算法[J].模式识别与人工智能，2019，32（11）：1032-1041.
WANG W L，WU F，LV C.Automatic determination of clustering center for clustering by fast search and find of density peaks[J].Pattern Recognition and Artificial Intelligence，2019，32（11）：1032-1041.
[22] LIU R，WANG H，YU X M.Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J].Information Sciences，2018，450：200-226.
[23] 张新元，贠卫国.共享K近邻和多分配策略的密度峰值聚类算法[J].小型微型计算机系统，2023，44（1）：75-82.
ZHANG X Y，YUN W G.Sharing K-nearest neighbors and multiple assignment policies density peaks clustering algorithm[J].Journal of Chinese Computer Systems，2023，44（1）：75-82.
[24] HONG C，YEUNG D Y.Robust path-based spectral clustering[J].Pattern Recognition，2008，41（1）：191-203.
[25] ZHANG R，DU T，QU S N，et al.Adaptive density-based clustering algorithm with shared KNN conflict game[J].Information Sciences，2021，565：344-369.
[26] VEENMAN C J，REINDERS M J T，BACKER E.A maximum variance cluster algorithm[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2002，24（9）：1273-1280.