计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (22): 78-85.DOI: 10.3778/j.issn.1002-8331.2010-0407

• 理论与研发 • 上一篇    下一篇

基于密度峰值多起始中心的融合聚类算法

梅婕,魏圆圆,许桃胜   

  1. 1.中国科学院 合肥物质科学研究院 智能机械研究所,合肥 230031
    2.中国科学技术大学,合肥 230026
    3.安徽省智慧农业工程实验室,合肥 230031
  • 出版日期:2021-11-15 发布日期:2021-11-16

Fusion Clustering Algorithm Based on Multi-Prototypes Using Density Peaks

MEI Jie, WEI Yuanyuan, XU Taosheng   

  1. 1.Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
    2.University of Science and Technology of China, Hefei 230026, China
    3.Intelligent Agriculture Engineering Laboratory of Anhui Province, Hefei 230031, China
  • Online:2021-11-15 Published:2021-11-16

摘要:

经典[K]-Means算法不能有效处理非球型数据集的聚类问题,且聚类目标数需预先指定。SMCL(Self-adaptive Multiprototype-based Competitive Learning)算法是一种[K]-Means的改进算法,它引入Multi-Prototypes机制,并将距离相近的Prototypes所代表的样本簇融合成聚类簇。在SMCL算法基础上提出DP-SMCL(Density Peak-SMCL)算法,使用密度峰值聚类算法确定初始聚类中心集,借助1-D高斯混合概率密度模型合并以Prototypes为中心的相近子簇来获得精确聚类结果。实验结果表明,DP-SMCL算法可应用于非球型数据集聚类,且能自动确认聚类的目标类别数,相比于[K]-Means和DBSCAN(Density-Based Spatial Clustering of Applications with Noise)等经典聚类算法能够获得更加准确的聚类结果。同时,与SMCL算法相比,DP-SMCL可以快速完成初始Prototypes的选定,显著提升算法准确率和执行效率。

关键词: [K]-Means, Multi-Prototypes, 聚类, 1-D高斯混合概率密度模型, 非球型数据集

Abstract:

The classical [K]-Means algorithm cannot effectively deal with the clustering problems of aspheric datasets and requires to specify the number of clusters manually. SMCL(Self-adaptive Multiprototype-based Competitive Learning) algorithm is an improved algorithm based on [K]-Means, which employs the Multi-Prototypes mechanism into the [K]-Means algorithm framework. Multi-Prototypes represent a series of preselected samples as the central point of the initial sub-clusters, and the sample sub-clusters with highly similarities are merged into a cluster based on a certain algorithm rule. This paper proposes an improved algorithm DP-SMCL(Density Peak-SMCL) which introduces the density peaks into the original SMCL algorithm. The density peak clustering algorithm is used to determine the Prototypes in the initial stage of clustering. The ultimate accurate clustering result is obtained by merging the sub-clusters with the 1-D Gaussian mixture probability density model which evaluates the similarity distance between two sub-clusters. The experimental results show that the DP-SMCL algorithm is suitable for aspheric datasets and can derive more precisely clustering results than [K]-Means algorithm and DBSCAN(Density-Based Spatial Clustering of Applications with Noise) algorithm. The DP-SMCL can automatically determine the inherent clustering target number. Compared with SMCL algorithm, DP-SMCL algorithm can rapidly pick out the initial prototypes while possesses a high accuracy of clustering results and a huge progress in algorithm efficiency.

Key words: [K]-Means, Multi-Prototypes, clustering, 1-D Gaussian mixture probability density model, aspheric dataset