Comparative Density Peaks Clustering Based on [K]-Nearest Neighbors

doi:10.3778/j.issn.1002-8331.1808-0006

Abstract

Abstract: The clustering effect of the Fast Search and Discovery Density Peak Clustering Algorithm（CFSFDP） relies heavily on the subjective setting of the truncation distance [dc], while the determination of the optimum value is not easy, and when dealing with the data sets with complex structure and large variations in density, the distinction generated by CFSFDP algorithm between the cluster center points and the non-cluster center points in the decision graph is not obvious enough, making the selection of the cluster centers difficult. Aiming at these problems, the algorithm is optimized and a Comparative Density Peak Clustering algorithm based on K-Nearest Neighbors（CDPC-KNN） is proposed. The algorithm combines the concept of K-nearest neighbors to redefine the measurement method of truncation distance and local density. It can adaptively generate the truncation distance for arbitrary datasets, and make the calculation results of local density more consistent with the real distribution of data. Meanwhile, the distance comparison quantity is introduced to replace the distance parameter, so that the cluster centers are more obvious on the decision graph. The experimental results show that the clustering effect of CDPC-KNN algorithm is better than CFSFDP algorithm and DBSCAN algorithm in general. The separation experiment shows that the new algorithm effectively improves the discrimination between cluster center points and non-cluster center points.

Key words: clustering algorithm, density peaks, K-nearest neighbors, decision graph, cluster centers

摘要： 快速搜索与发现密度峰值聚类算法（Fast Search and Discovery Density Peak Clustering Algorithm，CFSFDP）的聚类效果十分依赖截断距离[dc]的主观选取，而最佳[dc]值的确定并不容易，并且当处理分布复杂、密度变化大的数据集时，算法生成的决策图中类簇中心点与非类簇中心点的区分不够明显，使类簇中心的选取变得困难。针对这些问题，对其算法进行了优化，并提出了基于K近邻的比较密度峰值聚类算法（Comparative Density Peak Clustering algorithm Based on K-Nearest Neighbors，CDPC-KNN）。算法结合K近邻概念重新定义了截断距离和局部密度的度量方法，对任意数据集能自适应地生成截断距离，并使局部密度的计算结果更符合数据的真实分布。同时在决策图中引入距离比较量代替原距离参数，使类簇中心在决策图上更加明显。通过实验验证，CDPC-KNN算法的聚类效果整体上优于CFSFDP算法与DBSCAN算法，分离度实验表明新算法使类簇中心与非类簇中心点的区分度得到有效提高。

关键词: 聚类算法, 密度峰值, K近邻, 决策图, 类簇中心

DU Pei, CHENG Xiaorong. Comparative Density Peaks Clustering Based on [K]-Nearest Neighbors[J]. Computer Engineering and Applications, 2019, 55(10): 161-168.

杜沛，程晓荣. 一种基于[K]近邻的比较密度峰值聚类算法[J]. 计算机工程与应用, 2019, 55(10): 161-168.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[3]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[4]	ZHANG Ziran, HUANG Weihua, CHEN Yang, ZHANG Zheng, LI Ziyuan. Improved Ant Colony Path Planning Algorithm Based on Bidirectional Search [J]. Computer Engineering and Applications, 2021, 57(21): 270-277.
[5]	DING Songyang, TIAN Qingyun. Density Peak Clustering Algorithm Based on Ball-Tree [J]. Computer Engineering and Applications, 2021, 57(20): 90-96.
[6]	WENG Yushang, XIAO Jinqiu, XIA Yu. Strip Surface Defect Detection Based on Improved Mask R-CNN Algorithm [J]. Computer Engineering and Applications, 2021, 57(19): 235-242.
[7]	BAI Lu, ZHAO Xin, KONG Yuting, ZHANG Zhenghang, SHAO Jinxin, QIAN Yurong. Survey of Spectral Clustering Algorithms [J]. Computer Engineering and Applications, 2021, 57(14): 15-26.
[8]	XIANG Yixuan, JIANG He, PAN Pinchen, SUN Conghui. Study on [K]-means Clustering Algorithm of Quadratic Power Coupling [J]. Computer Engineering and Applications, 2021, 57(14): 95-102.
[9]	SUN Zhiran, SU Hang, LIANG Yi. Improved K-Prototypes Clustering Algorithm [J]. Computer Engineering and Applications, 2020, 56(21): 54-59.
[10]	YUE Xiaoxin, JIA Junxia, CHEN Xidong, LI Guang’an. Road Small Target Detection Algorithm Based on Improved YOLO V3 [J]. Computer Engineering and Applications, 2020, 56(21): 218-223.
[11]	GUO Yongkun, ZHANG Xinyou, LIU Liping, DING Liang, NIU Xiaolu. K-means Clustering Algorithm of Optimizing Initial Clustering Center [J]. Computer Engineering and Applications, 2020, 56(15): 172-178.
[12]	JIA Lu, ZHANG Desheng, LV Duanduan. Optimized Density Peak Clustering Algorithm in Physics [J]. Computer Engineering and Applications, 2020, 56(13): 47-53.
[13]	FAN Xiaobo, ZHANG Huijun, ZHANG Xiaolong. Research on Interactive Visual Analysis Method of Enterprise Log Data [J]. Computer Engineering and Applications, 2019, 55(23): 248-256.
[14]	YANG Junchuang, ZHAO Chao. Survey on K-Means Clustering Algorithm [J]. Computer Engineering and Applications, 2019, 55(23): 7-14.
[15]	ZHU Qingfeng1，2, GE Hongwei1，2. Density Peaks Clustering Optimized by [K] Nearest Neighbor’s Similarity [J]. Computer Engineering and Applications, 2019, 55(2): 148-153.

Comparative Density Peaks Clustering Based on [K]-Nearest Neighbors

一种基于[K]近邻的比较密度峰值聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics