基于距离和权重改进的K-means算法

doi:10.3778/j.issn.1002-8331.2009-0103

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (23): 87-94.DOI: 10.3778/j.issn.1002-8331.2009-0103

基于距离和权重改进的K-means算法

王子龙，李进，宋亚飞

1.空军工程大学研究生院，西安 710051
2.空军工程大学防空反导学院，西安 710051

出版日期:2020-12-01 发布日期:2020-11-30

Improved K-means Algorithm Based on Distance and Weight

WANG Zilong, LI Jin, SONG Yafei

1.Graduate College, Air Force Engineering University, Xi’an 710051, China
2.School of Air and Missile Defense, Air Force Engineering University, Xi’an 710051, China

Online:2020-12-01 Published:2020-11-30

摘要/Abstract

摘要：

K-means聚类算法简单高效，应用广泛。针对传统K-means算法初始聚类中心点的选择随机性导致算法易陷入局部最优以及K值需要人工确定的问题，为了得到最合适的初始聚类中心，提出一种基于距离和样本权重改进的K-means算法。该聚类算法采用维度加权的欧氏距离来度量样本点之间的远近，计算出所有样本的密度和权重后，令密度最大的点作为第一个初始聚类中心，并剔除该簇内所有样本，然后依次根据上一个聚类中心和数据集中剩下样本点的权重并通过引入的参数[τi]找出下一个初始聚类中心，不断重复此过程直至数据集为空，最后自动得到[k]个初始聚类中心。在UCI数据集上进行测试，对比经典K-means算法、WK-means算法、ZK-means算法和DCK-means算法，基于距离和权重改进的K-means算法的聚类效果更好。

关键词: 数据挖掘, K-means算法, 初始聚类中心, 加权欧式距离, 权重

Abstract:

K-means clustering algorithm is simple, efficient and widely used. The randomness of the selection of the initial clustering center of the traditional K-means algorithm leads to the problem that the algorithm is easy to fall into the local optimal and the K value needs to be determined manually. In order to obtain the most appropriate initial clustering center, an improved K-means algorithm based on distance and sample weight is proposed. This clustering algorithm uses dimensionally-weighted Euclidean distance to measure the distance between sample points, after calculating the density and weight of all samples, the point with the highest density is used as the first initial cluster center, and all samples within the cluster are eliminated, then, according to the last cluster center and the weights of the remaining sample points in the data set, the next initial cluster center is found through the introduced parameter [τi], this process is repeated until the data set is empty, finally [k] initial cluster centers are automatically obtained. The experiments are carried out on the UCI data set. Compared with the classical K-means algorithm, WK-means algorithm, ZK-means algorithm and DCK-means algorithm, the improved K-means algorithm based on distance and weight has better clustering effect.

Key words: data mining, K-means algorithm, initial cluster center, weighted Euclidean distance, weight product

王子龙，李进，宋亚飞. 基于距离和权重改进的K-means算法[J]. 计算机工程与应用, 2020, 56(23): 87-94.

WANG Zilong, LI Jin, SONG Yafei. Improved K-means Algorithm Based on Distance and Weight[J]. Computer Engineering and Applications, 2020, 56(23): 87-94.

[1]	伍京华，耿翠阳，韩佳丽. 基于Agent的多属性决策模型及其在高校实验教学中的应用[J]. 计算机工程与应用, 2021, 57(8): 238-243.
[2]	宗晓萍，陶泽泽. 基于掌握速度的知识追踪模型[J]. 计算机工程与应用, 2021, 57(6): 117-123.
[3]	陈世明，林子朋，高彦丽，裴惠琴. 自适应耦合权重下的异质群体一致性研究[J]. 计算机工程与应用, 2021, 57(4): 231-235.
[4]	张公凯，陈才学，郑拓. 改进鲸鱼算法在电动汽车有序充电中的应用[J]. 计算机工程与应用, 2021, 57(4): 272-278.
[5]	周舟，韩芳，王直杰. 改进SSD算法在中国手语识别上的应用[J]. 计算机工程与应用, 2021, 57(3): 156-161.
[6]	朱惠娟，宗平，丛玉华. 基于权重池的多尺度图像质量评估方法[J]. 计算机工程与应用, 2021, 57(3): 215-221.
[7]	高天宇，王庆荣，杨磊. 粗糙集属性依赖度强化的应急数据挖掘模型[J]. 计算机工程与应用, 2021, 57(3): 87-93.
[8]	刘万军，张正寰，曲海成. 融合DenseNet的多尺度图像去模糊模型[J]. 计算机工程与应用, 2021, 57(24): 219-226.
[9]	蒋斌，梁小安，张亮，高杨军. 基于改进修正权重的证据组合方法[J]. 计算机工程与应用, 2021, 57(24): 100-106.
[10]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.
[11]	李守玉，何庆，杜逆索. 分段权重和变异反向学习的蝴蝶优化算法[J]. 计算机工程与应用, 2021, 57(22): 92-101.
[12]	张慧婷，谢红薇，周辉，张昊. 融合权重机制和改进SDIM的偏标记分类算法[J]. 计算机工程与应用, 2021, 57(21): 195-202.
[13]	陈雷，尹钧圣. 高斯差分变异和对数惯性权重优化的鲸群算法[J]. 计算机工程与应用, 2021, 57(2): 77-90.
[14]	马洋，赵旭俊. 基于相关子空间的多源离群检测算法[J]. 计算机工程与应用, 2021, 57(17): 88-95.
[15]	姜良重，雷航，李贞昊，钱伟中，施甘图. 采用自适应优化权重的出库货位优化方法研究[J]. 计算机工程与应用, 2021, 57(15): 271-278.

基于距离和权重改进的K-means算法

Improved K-means Algorithm Based on Distance and Weight

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics