融合单纯形映射与熵加权的聚类方法

doi:10.3778/j.issn.1002-8331.1901-0095

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (9): 148-155.DOI: 10.3778/j.issn.1002-8331.1901-0095

融合单纯形映射与熵加权的聚类方法

安宁，江思源，唐晨，杨矫云

1.合肥工业大学国家智慧养老国际科技合作基地，合肥 230601
2.合肥工业大学计算机与信息学院，合肥 230601

出版日期:2020-05-01 发布日期:2020-04-29

Clustering Method by Combining Simplex Mapping and Entropy Weighting

AN Ning, JIANG Siyuan, TANG Chen, YANG Jiaoyun

1.National Smart Eldercare International S&T Cooperation Base, Hefei University of Technology, Hefei 230601, China
2.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China

Online:2020-05-01 Published:2020-04-29

摘要/Abstract

摘要：

由于分类型和数值型属性特性的差异，设计混合类型数据聚类算法时通常需要对两种类型属性区别对待，增加了聚类算法的设计与实现难度。另外，不同属性所包含的信息量存在差异，但现有算法通常平等对待各个属性。提出了一种融合单纯形映射与信息熵加权的混合类型数据聚类算法。基于单纯形理论将分类型属性映射为高维数值属性向量，应用信息熵理论为各属性分配权重建立相似性度量公式，将该度量方法应用于K-Means算法框架得到聚类算法。在6个UCI的混合数据集上的实验表明，提出的聚类算法优于传统映射聚类算法和K-Prototype算法，在准确度上分别提高了2.70％和18.33％。

关键词: 向量映射, 熵加权, 相似性度量, 混合数据集, 聚类分析

Abstract:

Due to the differences between categorical attributes and numerical attributes, researchers usually need to deal with these two types of attributes differently when designing clustering methods for mixed datasets. This increases the difficulty of designing and implementing clustering methods. Besides, the information contained in different attributes varies a lot, however, current methods treat different attributes equally. This paper proposes a weighted simplex-based mapping method for mixed data clustering. It maps the categorical attributes into high dimensional numerical attributes based on simplex theory, applies entropy theory to weight different attributes to establish the similarity measurement. The measurement is integrated with K-Means framework to form a clustering method. The experiments on 6 UCI mixed datasets show that the proposed method outperforms traditional mapping method and K-Prototype method, with 2.70% and 18.33% improvement in terms of accuracy.

Key words: vector mapping, entropy-based weight, similarity measurement, mixed datasets, clustering analysis

安宁，江思源，唐晨，杨矫云. 融合单纯形映射与熵加权的聚类方法[J]. 计算机工程与应用, 2020, 56(9): 148-155.

AN Ning, JIANG Siyuan, TANG Chen, YANG Jiaoyun. Clustering Method by Combining Simplex Mapping and Entropy Weighting[J]. Computer Engineering and Applications, 2020, 56(9): 148-155.

[1]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[2]	王鹏，叶学义，王涛，钱丁炜. 双偏差双空间局部方向模式的人脸识别[J]. 计算机工程与应用, 2021, 57(4): 91-99.
[3]	衣俊艳，杜小鹏. 具有中心移动特性的弹性网络聚类算法研究[J]. 计算机工程与应用, 2020, 56(24): 50-58.
[4]	衣俊艳，吴博雅，雍巧玲. 具有加权特性的弹性网络聚类算法研究[J]. 计算机工程与应用, 2020, 56(22): 55-65.
[5]	罗计根，杜建强，聂斌，李欢，聂建华，陈裕凤. 一种聚类欠采样策略的随机森林优化方法[J]. 计算机工程与应用, 2020, 56(22): 166-172.
[6]	王工书，任尊晓，李丹丹，相洁，王彬. 脑激活任务区分度的分析及应用研究[J]. 计算机工程与应用, 2020, 56(21): 272-278.
[7]	马京晖，潘巍，王茹. 基于K-means聚类的三维点云分类[J]. 计算机工程与应用, 2020, 56(17): 181-186.
[8]	蒋世豪，江洪. 基于GDAL的遥感图像变化检测技术[J]. 计算机工程与应用, 2020, 56(16): 169-175.
[9]	贾露，张德生，吕端端. 物理学优化的密度峰值聚类算法[J]. 计算机工程与应用, 2020, 56(13): 47-53.
[10]	陈胜发，贾瑞玉. 基于残差和密度网格的簇心自确认聚类算法[J]. 计算机工程与应用, 2020, 56(12): 149-155.
[11]	黄建新，袁杰. 三维空间机器人主动嗅觉烟羽源自主定位策略[J]. 计算机工程与应用, 2020, 56(12): 223-230.
[12]	雍巧玲，衣俊艳. 具有动态特性的聚类弹性网络算法研究[J]. 计算机工程与应用, 2019, 55(8): 102-109.
[13]	胡正平，刘怀飚，孙德刚. 邻域排斥稀疏判决单样本亲属关系认证算法[J]. 计算机工程与应用, 2019, 55(22): 133-139.
[14]	余炳光，刘冬梅. 特征逐减的可能性模糊聚类算法[J]. 计算机工程与应用, 2019, 55(19): 58-65.
[15]	赵菊萍，李风军. Vague集（值）相似性度量[J]. 计算机工程与应用, 2019, 55(15): 69-74.

融合单纯形映射与熵加权的聚类方法

Clustering Method by Combining Simplex Mapping and Entropy Weighting

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics