Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (10): 161-168.DOI: 10.3778/j.issn.1002-8331.1808-0006

Previous Articles     Next Articles

Comparative Density Peaks Clustering Based on [K]-Nearest Neighbors

DU Pei, CHENG Xiaorong   

  1. School of Control and Computer Engineering, North China Electric Power University, Baoding, Hebei 071000, China
  • Online:2019-05-15 Published:2019-05-13

一种基于[K]近邻的比较密度峰值聚类算法

杜  沛,程晓荣   

  1. 华北电力大学 控制与计算机工程学院,河北 保定 071000

Abstract: The clustering effect of the Fast Search and Discovery Density Peak Clustering Algorithm(CFSFDP) relies heavily on the subjective setting of the truncation distance [dc], while the determination of the optimum value is not easy, and when dealing with the data sets with complex structure and large variations in density, the distinction generated by CFSFDP algorithm between the cluster center points and the non-cluster center points in the decision graph is not obvious enough, making the selection of the cluster centers difficult. Aiming at these problems, the algorithm is optimized and a Comparative Density Peak Clustering algorithm based on K-Nearest Neighbors(CDPC-KNN) is proposed. The algorithm combines the concept of K-nearest neighbors to redefine the measurement method of truncation distance and local density. It can adaptively generate the truncation distance for arbitrary datasets, and make the calculation results of local density more consistent with the real distribution of data. Meanwhile, the distance comparison quantity is introduced to replace the distance parameter, so that the cluster centers are more obvious on the decision graph. The experimental results show that the clustering effect of CDPC-KNN algorithm is better than CFSFDP algorithm and DBSCAN algorithm in general. The separation experiment shows that the new algorithm effectively improves the discrimination between cluster center points and non-cluster center points.

Key words: clustering algorithm, density peaks, K-nearest neighbors, decision graph, cluster centers

摘要: 快速搜索与发现密度峰值聚类算法(Fast Search and Discovery Density Peak Clustering Algorithm,CFSFDP)的聚类效果十分依赖截断距离[dc]的主观选取,而最佳[dc]值的确定并不容易,并且当处理分布复杂、密度变化大的数据集时,算法生成的决策图中类簇中心点与非类簇中心点的区分不够明显,使类簇中心的选取变得困难。针对这些问题,对其算法进行了优化,并提出了基于K近邻的比较密度峰值聚类算法(Comparative Density Peak Clustering algorithm Based on K-Nearest Neighbors,CDPC-KNN)。算法结合K近邻概念重新定义了截断距离和局部密度的度量方法,对任意数据集能自适应地生成截断距离,并使局部密度的计算结果更符合数据的真实分布。同时在决策图中引入距离比较量代替原距离参数,使类簇中心在决策图上更加明显。通过实验验证,CDPC-KNN算法的聚类效果整体上优于CFSFDP算法与DBSCAN算法,分离度实验表明新算法使类簇中心与非类簇中心点的区分度得到有效提高。

关键词: 聚类算法, 密度峰值, K近邻, 决策图, 类簇中心