结合K近邻的改进密度峰值聚类算法

doi:10.3778/j.issn.1002-8331.1801-0013

计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (7): 36-43.DOI: 10.3778/j.issn.1002-8331.1801-0013

结合K近邻的改进密度峰值聚类算法

薛小娜1，高淑萍1，彭弘铭2，吴会会1

1.西安电子科技大学数学与统计学院，西安 710126
2.西安电子科技大学通信工程学院，西安 710071

出版日期:2018-04-01 发布日期:2018-04-16

Improved density peaks clustering algorithm combining K-Nearest Neighbors

XUE Xiaona1, GAO Shuping1, PENG Hongming2, WU Huihui1

1. School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
2. School of Telecommunications Engineering, Xidian University, Xi’an 710071, China

Online:2018-04-01 Published:2018-04-16

摘要/Abstract

摘要： 针对密度峰值聚类算法（DPC）在处理维数较高、含噪声及结构复杂数据集时聚类性能不佳问题，提出一种结合K近邻的改进密度峰值聚类算法（IDPCA）。该算法首先给出新的局部密度度量方法来描述每个样本在空间中的分布情况，然后引入核心点的概念并结合K近邻思想设计了全局搜索分配策略，通过不断将核心点的未分配K近邻正确归类以加快聚类速度，进而提出一种基于K近邻加权的统计学习分配策略，利用剩余点的K近邻加权信息来确定其被分配到各局部类的概率，有效提高了聚类质量。实验结果表明，IDPCA算法在21个典型的测试数据集上均有良好的适用性，而在与DPC算法及另外3种典型聚类算法的性能指标对比上，其优势更为明显。

关键词: 数据挖掘, 聚类算法, 局部密度, 密度峰值, K近邻

Abstract: Concerning the problem that Density Peaks Clustering（DPC） algorithm has poor performance on the datasets with high dimension, noise and complex structure, an Improved Density Peaks Clustering Algorithm（IDPCA） combining K-Nearest Neighbors is proposed. Firstly, a new definition of local density is proposed to describe the distribution of the spatial samples. Secondly, the concept of core point is introduced and a global search allocation strategy is designed based on K-Nearest Neighbors thought to classify the unassigned K-Nearest Neighbors of core points correctly, which accelerates the clustering speed. Thirdly, a statistical learning allocation strategy is developed, by using the weighted K-Nearest Neighbors’ information of the unassigned points to calculate the probability of them being assigned to each local cluster, which improves the clustering quality effectively. Finally, compared with DPC and other three classical clustering methods on 21 test datasets including synthetic and real-world datasets, the experimental results show that IDPCA outperforms them on four different evaluation indexes.

Key words: data mining, clustering algorithm, local density, density peaks, K-Nearest Neighbors

薛小娜1，高淑萍1，彭弘铭2，吴会会1. 结合K近邻的改进密度峰值聚类算法[J]. 计算机工程与应用, 2018, 54(7): 36-43.

XUE Xiaona1, GAO Shuping1, PENG Hongming2, WU Huihui1. Improved density peaks clustering algorithm combining K-Nearest Neighbors[J]. Computer Engineering and Applications, 2018, 54(7): 36-43.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[3]	宗晓萍，陶泽泽. 基于掌握速度的知识追踪模型[J]. 计算机工程与应用, 2021, 57(6): 117-123.
[4]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[5]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[6]	高天宇，王庆荣，杨磊. 粗糙集属性依赖度强化的应急数据挖掘模型[J]. 计算机工程与应用, 2021, 57(3): 87-93.
[7]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[8]	张子然，黄卫华，陈阳，章政，李梓远. 基于双向搜索的改进蚁群路径规划算法[J]. 计算机工程与应用, 2021, 57(21): 270-277.
[9]	丁松阳，田青云. Ball-Tree优化的密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(20): 90-96.
[10]	卫丹妮，杨有龙，仇海全. 结合密度峰值和切边权值的自训练算法[J]. 计算机工程与应用, 2021, 57(2): 70-76.
[11]	翁玉尚，肖金球，夏禹. 改进Mask R-CNN算法的带钢表面缺陷检测[J]. 计算机工程与应用, 2021, 57(19): 235-242.
[12]	马洋，赵旭俊. 基于相关子空间的多源离群检测算法[J]. 计算机工程与应用, 2021, 57(17): 88-95.
[13]	白璐，赵鑫，孔钰婷，张正航，邵金鑫，钱育蓉. 谱聚类算法研究综述[J]. 计算机工程与应用, 2021, 57(14): 15-26.
[14]	相益萱，姜合，潘品臣，孙聪慧. 二次幂耦合的[K]-means聚类算法研究[J]. 计算机工程与应用, 2021, 57(14): 95-102.
[15]	张念蓬，吴旭，朱强. 基于熵的过采样框架[J]. 计算机工程与应用, 2021, 57(13): 96-101.

结合K近邻的改进密度峰值聚类算法

Improved density peaks clustering algorithm combining K-Nearest Neighbors

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics