计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (16): 50-54.DOI: 10.3778/j.issn.1002-8331.1907-0334

• 理论与研发 • 上一篇    下一篇

结合最大最小距离和加权密度的K-means聚类算法

马克勤,杨延娇,秦红武,耿琳,王丕栋   

  1. 西北师范大学 计算机科学与工程学院,兰州 730070
  • 出版日期:2020-08-15 发布日期:2020-08-11

K-means Clustering Algorithm Combining Max-Min Distance and Weighted Density

MA Keqin, YANG Yanjiao, QIN Hongwu, GENG Lin, WANG Pidong   

  1. College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
  • Online:2020-08-15 Published:2020-08-11

摘要:

随机选取初始聚类中心和根据经验设置[K]值对[K]-means聚类结果都有一定的影响,针对这一问题,提出了一种基于加权密度和最大最小距离的[K]-means聚类算法,称为[KWDM]算法。该算法利用加权密度法选取初始聚类中心点集,减少了离群点对聚类结果的影响,通过最大最小距离准则启发式地选择聚类中心,避免了聚类结果陷入局部最优,最后使用准则函数即簇内距离和簇间距离的比值来确定[K]值,防止了根据经验来设置[K]值。在人工数据集和UCI数据集上的实验结果表明,KWDM算法不仅提高了聚类的准确率,而且减少了算法的平均迭代次数,增强了算法的稳定性。

关键词: K-means, 初始中心, 离群点, 密度法, 最大最小距离

Abstract:

Both the random selection of initial clustering center and the empirical determination of [K] value have a certain impact on [K]-means clustering results. A [K]-means clustering algorithm based on weighted density and max-min distance is proposed. The clustering center set is selected by using the weighted density method to reduce the impact of outliers on clustering results. Then the center point is selected by the max-min distance to avoid the clustering result falling into local optimum. Finally, the value of [K] is determined by the ratio of the distance within clusters to the distance between clusters. Experiments show that the improved algorithm not only improves the accuracy of clustering, reduces the average iteration times of the algorithm, but also enhances the stability of the algorithm.

Key words: K-means, initial center, outliers, density method, max-min distance