Improved density peaks clustering algorithm combining K-Nearest Neighbors

doi:10.3778/j.issn.1002-8331.1801-0013

Abstract

Abstract: Concerning the problem that Density Peaks Clustering（DPC） algorithm has poor performance on the datasets with high dimension, noise and complex structure, an Improved Density Peaks Clustering Algorithm（IDPCA） combining K-Nearest Neighbors is proposed. Firstly, a new definition of local density is proposed to describe the distribution of the spatial samples. Secondly, the concept of core point is introduced and a global search allocation strategy is designed based on K-Nearest Neighbors thought to classify the unassigned K-Nearest Neighbors of core points correctly, which accelerates the clustering speed. Thirdly, a statistical learning allocation strategy is developed, by using the weighted K-Nearest Neighbors’ information of the unassigned points to calculate the probability of them being assigned to each local cluster, which improves the clustering quality effectively. Finally, compared with DPC and other three classical clustering methods on 21 test datasets including synthetic and real-world datasets, the experimental results show that IDPCA outperforms them on four different evaluation indexes.

Key words: data mining, clustering algorithm, local density, density peaks, K-Nearest Neighbors

摘要： 针对密度峰值聚类算法（DPC）在处理维数较高、含噪声及结构复杂数据集时聚类性能不佳问题，提出一种结合K近邻的改进密度峰值聚类算法（IDPCA）。该算法首先给出新的局部密度度量方法来描述每个样本在空间中的分布情况，然后引入核心点的概念并结合K近邻思想设计了全局搜索分配策略，通过不断将核心点的未分配K近邻正确归类以加快聚类速度，进而提出一种基于K近邻加权的统计学习分配策略，利用剩余点的K近邻加权信息来确定其被分配到各局部类的概率，有效提高了聚类质量。实验结果表明，IDPCA算法在21个典型的测试数据集上均有良好的适用性，而在与DPC算法及另外3种典型聚类算法的性能指标对比上，其优势更为明显。

关键词: 数据挖掘, 聚类算法, 局部密度, 密度峰值, K近邻

XUE Xiaona1, GAO Shuping1, PENG Hongming2, WU Huihui1. Improved density peaks clustering algorithm combining K-Nearest Neighbors[J]. Computer Engineering and Applications, 2018, 54(7): 36-43.

薛小娜1，高淑萍1，彭弘铭2，吴会会1. 结合K近邻的改进密度峰值聚类算法[J]. 计算机工程与应用, 2018, 54(7): 36-43.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	ZONG Xiaoping, TAO Zeze. Knowledge Tracing Model Based on Mastery Speed [J]. Computer Engineering and Applications, 2021, 57(6): 117-123.
[3]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[4]	GAO Tianyu, WANG Qingrong, YANG Lei. Data Mining Model Based on Attribute Dependability Enhancement of Rough Set [J]. Computer Engineering and Applications, 2021, 57(3): 87-93.
[5]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[6]	ZHANG Ziran, HUANG Weihua, CHEN Yang, ZHANG Zheng, LI Ziyuan. Improved Ant Colony Path Planning Algorithm Based on Bidirectional Search [J]. Computer Engineering and Applications, 2021, 57(21): 270-277.
[7]	DING Songyang, TIAN Qingyun. Density Peak Clustering Algorithm Based on Ball-Tree [J]. Computer Engineering and Applications, 2021, 57(20): 90-96.
[8]	WENG Yushang, XIAO Jinqiu, XIA Yu. Strip Surface Defect Detection Based on Improved Mask R-CNN Algorithm [J]. Computer Engineering and Applications, 2021, 57(19): 235-242.
[9]	MA Yang, ZHAO Xujun. Multi-source Outlier Detection Algorithm Based on Relevant Subspace [J]. Computer Engineering and Applications, 2021, 57(17): 88-95.
[10]	BAI Lu, ZHAO Xin, KONG Yuting, ZHANG Zhenghang, SHAO Jinxin, QIAN Yurong. Survey of Spectral Clustering Algorithms [J]. Computer Engineering and Applications, 2021, 57(14): 15-26.
[11]	XIANG Yixuan, JIANG He, PAN Pinchen, SUN Conghui. Study on [K]-means Clustering Algorithm of Quadratic Power Coupling [J]. Computer Engineering and Applications, 2021, 57(14): 95-102.
[12]	ZHANG Nianpeng, WU Xu, ZHU Qiang. Entropy-Based Oversampling Framework [J]. Computer Engineering and Applications, 2021, 57(13): 96-101.
[13]	ZHANG Bowen, LIU Zhi, SANG Guoming. Anomaly Detection Algorithm Based on Kernel Density Fluctuation [J]. Computer Engineering and Applications, 2021, 57(12): 132-136.
[14]	RAO Jiawang, MA Ronghua. Improved Kernel Density Estimator Based Spatial Point Density Algorithm [J]. Computer Engineering and Applications, 2021, 57(11): 260-265.
[15]	WANG Jie, CHEN Zhigang, LIU Jialing, CHENG Hongbing. Privacy Behavior Mining Technology for Cloud Computing Based on Clustering [J]. Computer Engineering and Applications, 2020, 56(5): 80-84.

Improved density peaks clustering algorithm combining K-Nearest Neighbors

结合K近邻的改进密度峰值聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics