计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (21): 54-59.DOI: 10.3778/j.issn.1002-8331.1912-0106

• 理论与研发 • 上一篇    下一篇

一种改进的K-Prototypes聚类算法

孙志冉,苏航,梁毅   

  1. 北京工业大学 信息学部,北京 100124
  • 出版日期:2020-11-01 发布日期:2020-11-03

Improved K-Prototypes Clustering Algorithm

SUN Zhiran, SU Hang, LIANG Yi   

  1. Faculty of Information, Beijing University of Technology, Beijing 100124, China
  • Online:2020-11-01 Published:2020-11-03

摘要:

针对K-Prototypes聚类算法中人为指定初始聚类中心和聚类数目导致算法准确度和稳定性低下的问题,提出了基于密度优化的K-Prototypes聚类算法,该算法根据数据对象的密度分布,自适应地优化聚类数目和初始聚类中心的设置,并通过区分每个属性对聚类结果的不同影响权重,改进相异度计算公式,提升聚类的准确度。在合成数据集和UCI数据集上实验结果表明,该算法与K-Prototypes算法、DPCM算法和Fuzzy K-Prototypes算法相比,平均准确率分别提高了8.52%、4.28%和8.33%,达到了相对较好的聚类结果。

关键词: 聚类算法, 初始中心点, 密度, 混合属性

Abstract:

There are some problems in the K-Prototypes clustering algorithm, such as manually specifying the initial clustering center and the number of clusters, which will lead to low accuracy and stability of the algorithm. In order to solve these problems, this paper proposes a K-Prototypes clustering algorithm based on density optimization, which can adaptively optimize the setting of the number of clusters and the initial clustering according to the distribution density of data objects, and can improve the accuracy of clustering by distinguishing the different influence weights of each attribute on clustering results and improve the distance calculation formula by distinguishing the different influence weights of each attribute on clustering results, which will improve the accuracy of clustering. The experimental results on synthetic data set and UCI data set show that the proposed method achieves better clustering results. Compared with K-Prototypes, DPCM and Fuzzy K-Prototypes, the average accuracy of the proposed method is improved by 8.52%, 4.28% and 8.33% respectively.

Key words: clustering algorithm, initial center points, density peak, mixed attributes