Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (9): 81-88.DOI: 10.3778/j.issn.1002-8331.2005-0011

Previous Articles     Next Articles

Fusion of KNN Optimized Density Peaks and FCM Clustering  Algorithm

LAN Hong, HUANG Min   

  1. School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
  • Online:2021-05-01 Published:2021-04-29

融合KNN优化的密度峰值和FCM聚类算法

兰红,黄敏   

  1. 江西理工大学 信息工程学院,江西 赣州 341000

Abstract:

Aiming at the problems that Fuzzy C-Means(FCM) clustering algorithm is sensitive to the initial clustering center and noise, is not accurate to boundary sample clustering and is easy to converge to the local minimum, a fusion clustering algorithm(KDPC-FCM) combining K Nearest Neighbor(KNN) optimized Density Peaks Clustering(DPC) algorithm and FCM is proposed. The algorithm uses the KNN information of the sample to define the local density of the sample, quickly and accurately searches the sample of the density peak point of the sample as the initial cluster center, and improves the shortcomings of the FCM clustering algorithm, so as to optimize the effect of the FCM clustering algorithm. The experimental results on multiple UCI data sets, a single man-made data set, multiple benchmark data sets, and 6 large-scale data sets in the Geolife project show that compared with the traditional FCM algorithm, and DSFCM algorithm, the improved new algorithm has better noise immunity, clustering effect and faster global convergence speed, which proves the feasibility and effectiveness of the new algorithm.

Key words: Fuzzy C-Means(FCM), clustering, density peaks, K Nearest Neighbor(KNN), algorithm optimization

摘要:

针对模糊C均值(Fuzzy C-Means,FCM)聚类算法对初始聚类中心和噪声敏感、对边界样本聚类不够准确且易收敛于局部极小值等问题,提出了一种K邻近(KNN)优化的密度峰值(DPC)算法和FCM相结合的融合聚类算法(KDPC-FCM)。算法利用样本的K近邻信息定义样本局部密度,快速准确搜索样本的密度峰值点样本作为初始类簇中心,改善FCM聚类算法存在的不足,从而达到优化FCM聚类算法效果的目的。在多个UCI数据集、单个人造数据集、多种基准数据集和Geolife项目中的6个较大规模数据集上的实验结果表明,改进后的新算法与传统FCM算法、DSFCM算法对比,有着更好的抗噪性、聚类效果和更快的全局收敛速度,证明了新算法的可行性和有效性。

关键词: 模糊C均值, 聚类, 密度峰值, K近邻, 算法优化