一种改进的K-Prototypes聚类算法

doi:10.3778/j.issn.1002-8331.1912-0106

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (21): 54-59.DOI: 10.3778/j.issn.1002-8331.1912-0106

一种改进的K-Prototypes聚类算法

孙志冉，苏航，梁毅

北京工业大学信息学部，北京 100124

出版日期:2020-11-01 发布日期:2020-11-03

Improved K-Prototypes Clustering Algorithm

SUN Zhiran, SU Hang, LIANG Yi

Faculty of Information, Beijing University of Technology, Beijing 100124, China

Online:2020-11-01 Published:2020-11-03

摘要/Abstract

摘要：

针对K-Prototypes聚类算法中人为指定初始聚类中心和聚类数目导致算法准确度和稳定性低下的问题，提出了基于密度优化的K-Prototypes聚类算法，该算法根据数据对象的密度分布，自适应地优化聚类数目和初始聚类中心的设置，并通过区分每个属性对聚类结果的不同影响权重，改进相异度计算公式，提升聚类的准确度。在合成数据集和UCI数据集上实验结果表明，该算法与K-Prototypes算法、DPCM算法和Fuzzy K-Prototypes算法相比，平均准确率分别提高了8.52%、4.28%和8.33%，达到了相对较好的聚类结果。

关键词: 聚类算法, 初始中心点, 密度, 混合属性

Abstract:

There are some problems in the K-Prototypes clustering algorithm, such as manually specifying the initial clustering center and the number of clusters, which will lead to low accuracy and stability of the algorithm. In order to solve these problems, this paper proposes a K-Prototypes clustering algorithm based on density optimization, which can adaptively optimize the setting of the number of clusters and the initial clustering according to the distribution density of data objects, and can improve the accuracy of clustering by distinguishing the different influence weights of each attribute on clustering results and improve the distance calculation formula by distinguishing the different influence weights of each attribute on clustering results, which will improve the accuracy of clustering. The experimental results on synthetic data set and UCI data set show that the proposed method achieves better clustering results. Compared with K-Prototypes, DPCM and Fuzzy K-Prototypes, the average accuracy of the proposed method is improved by 8.52%, 4.28% and 8.33% respectively.

Key words: clustering algorithm, initial center points, density peak, mixed attributes

孙志冉，苏航，梁毅. 一种改进的K-Prototypes聚类算法[J]. 计算机工程与应用, 2020, 56(21): 54-59.

SUN Zhiran, SU Hang, LIANG Yi. Improved K-Prototypes Clustering Algorithm[J]. Computer Engineering and Applications, 2020, 56(21): 54-59.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[3]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[4]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[5]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[6]	张忠林，赵昱，闫光辉. 自然邻居密度极值聚类算法[J]. 计算机工程与应用, 2021, 57(23): 200-210.
[7]	梅婕，魏圆圆，许桃胜. 基于密度峰值多起始中心的融合聚类算法[J]. 计算机工程与应用, 2021, 57(22): 78-85.
[8]	左健豪，姜文刚. 自适应融合特征的人群计数网络[J]. 计算机工程与应用, 2021, 57(21): 203-208.
[9]	张子然，黄卫华，陈阳，章政，李梓远. 基于双向搜索的改进蚁群路径规划算法[J]. 计算机工程与应用, 2021, 57(21): 270-277.
[10]	丁松阳，田青云. Ball-Tree优化的密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(20): 90-96.
[11]	卫丹妮，杨有龙，仇海全. 结合密度峰值和切边权值的自训练算法[J]. 计算机工程与应用, 2021, 57(2): 70-76.
[12]	翁玉尚，肖金球，夏禹. 改进Mask R-CNN算法的带钢表面缺陷检测[J]. 计算机工程与应用, 2021, 57(19): 235-242.
[13]	白璐，赵鑫，孔钰婷，张正航，邵金鑫，钱育蓉. 谱聚类算法研究综述[J]. 计算机工程与应用, 2021, 57(14): 15-26.
[14]	相益萱，姜合，潘品臣，孙聪慧. 二次幂耦合的[K]-means聚类算法研究[J]. 计算机工程与应用, 2021, 57(14): 95-102.
[15]	张博文，刘智，桑国明. 基于核密度波动的异常检测算法[J]. 计算机工程与应用, 2021, 57(12): 132-136.

一种改进的K-Prototypes聚类算法

Improved K-Prototypes Clustering Algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics