一种基于[K]近邻的比较密度峰值聚类算法

doi:10.3778/j.issn.1002-8331.1808-0006

计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (10): 161-168.DOI: 10.3778/j.issn.1002-8331.1808-0006

一种基于[K]近邻的比较密度峰值聚类算法

杜沛，程晓荣

华北电力大学控制与计算机工程学院，河北保定 071000

出版日期:2019-05-15 发布日期:2019-05-13

Comparative Density Peaks Clustering Based on [K]-Nearest Neighbors

DU Pei, CHENG Xiaorong

School of Control and Computer Engineering, North China Electric Power University, Baoding, Hebei 071000, China

Online:2019-05-15 Published:2019-05-13

摘要/Abstract

摘要： 快速搜索与发现密度峰值聚类算法（Fast Search and Discovery Density Peak Clustering Algorithm，CFSFDP）的聚类效果十分依赖截断距离[dc]的主观选取，而最佳[dc]值的确定并不容易，并且当处理分布复杂、密度变化大的数据集时，算法生成的决策图中类簇中心点与非类簇中心点的区分不够明显，使类簇中心的选取变得困难。针对这些问题，对其算法进行了优化，并提出了基于K近邻的比较密度峰值聚类算法（Comparative Density Peak Clustering algorithm Based on K-Nearest Neighbors，CDPC-KNN）。算法结合K近邻概念重新定义了截断距离和局部密度的度量方法，对任意数据集能自适应地生成截断距离，并使局部密度的计算结果更符合数据的真实分布。同时在决策图中引入距离比较量代替原距离参数，使类簇中心在决策图上更加明显。通过实验验证，CDPC-KNN算法的聚类效果整体上优于CFSFDP算法与DBSCAN算法，分离度实验表明新算法使类簇中心与非类簇中心点的区分度得到有效提高。

关键词: 聚类算法, 密度峰值, K近邻, 决策图, 类簇中心

Abstract: The clustering effect of the Fast Search and Discovery Density Peak Clustering Algorithm（CFSFDP） relies heavily on the subjective setting of the truncation distance [dc], while the determination of the optimum value is not easy, and when dealing with the data sets with complex structure and large variations in density, the distinction generated by CFSFDP algorithm between the cluster center points and the non-cluster center points in the decision graph is not obvious enough, making the selection of the cluster centers difficult. Aiming at these problems, the algorithm is optimized and a Comparative Density Peak Clustering algorithm based on K-Nearest Neighbors（CDPC-KNN） is proposed. The algorithm combines the concept of K-nearest neighbors to redefine the measurement method of truncation distance and local density. It can adaptively generate the truncation distance for arbitrary datasets, and make the calculation results of local density more consistent with the real distribution of data. Meanwhile, the distance comparison quantity is introduced to replace the distance parameter, so that the cluster centers are more obvious on the decision graph. The experimental results show that the clustering effect of CDPC-KNN algorithm is better than CFSFDP algorithm and DBSCAN algorithm in general. The separation experiment shows that the new algorithm effectively improves the discrimination between cluster center points and non-cluster center points.

Key words: clustering algorithm, density peaks, K-nearest neighbors, decision graph, cluster centers

杜沛，程晓荣. 一种基于[K]近邻的比较密度峰值聚类算法[J]. 计算机工程与应用, 2019, 55(10): 161-168.

DU Pei, CHENG Xiaorong. Comparative Density Peaks Clustering Based on [K]-Nearest Neighbors[J]. Computer Engineering and Applications, 2019, 55(10): 161-168.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[3]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[4]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[5]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[6]	张子然，黄卫华，陈阳，章政，李梓远. 基于双向搜索的改进蚁群路径规划算法[J]. 计算机工程与应用, 2021, 57(21): 270-277.
[7]	丁松阳，田青云. Ball-Tree优化的密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(20): 90-96.
[8]	卫丹妮，杨有龙，仇海全. 结合密度峰值和切边权值的自训练算法[J]. 计算机工程与应用, 2021, 57(2): 70-76.
[9]	翁玉尚，肖金球，夏禹. 改进Mask R-CNN算法的带钢表面缺陷检测[J]. 计算机工程与应用, 2021, 57(19): 235-242.
[10]	白璐，赵鑫，孔钰婷，张正航，邵金鑫，钱育蓉. 谱聚类算法研究综述[J]. 计算机工程与应用, 2021, 57(14): 15-26.
[11]	相益萱，姜合，潘品臣，孙聪慧. 二次幂耦合的[K]-means聚类算法研究[J]. 计算机工程与应用, 2021, 57(14): 95-102.
[12]	韩纪普，段先华，常振. 基于SLIC和区域生长的目标分割算法[J]. 计算机工程与应用, 2021, 57(1): 213-218.
[13]	王彩文，杨有龙. 针对不平衡数据的改进的近邻分类算法[J]. 计算机工程与应用, 2020, 56(7): 30-38.
[14]	李杰其，胡良兵. 基于机器学习的设备预测性维护方法综述[J]. 计算机工程与应用, 2020, 56(21): 11-19.
[15]	孙志冉，苏航，梁毅. 一种改进的K-Prototypes聚类算法[J]. 计算机工程与应用, 2020, 56(21): 54-59.

一种基于[K]近邻的比较密度峰值聚类算法

Comparative Density Peaks Clustering Based on [K]-Nearest Neighbors

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics