计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (32): 56-58.

• 研究、探讨 • 上一篇    下一篇

可变阈值的K-Means初始中心选择方法

刘一鸣,张化祥   

  1. 山东师范大学 信息科学与工程学院,济南 250014
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-11-11 发布日期:2011-11-11

Approach to selecting initial centers for K-Means with variable threshold

LIU Yiming,ZHANG Huaxiang   

  1. Department of Information Science and Engineering,Shandong Normal University,Jinan 250014,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-11-11 Published:2011-11-11

摘要: K-Means算法随机选择聚类中心初始点,导致聚类器性能不稳定。对此,提出基于可变阈值的初始聚类中心选择方法(VTK-Means)。该算法选择距已有初始点距离大于一个阈值的样例作为初始聚类中心,并根据满足条件的初始聚类中心个数适当调整阈值。在10个UCI数据集上的实验结果表明,该算法性能明显优于K-Means算法。

关键词: K-Means, 聚类, 可变阈值, 初始聚类中心

Abstract: The K-Means algorithm selects the initial clustering centers randomly,which results in the performance of the clustering instability.In order to improve the limitation,a novel clustering algorithm(VTK-Means) based on variable threshold to select initial cluster centers is proposed in this paper.The algorithm tries to select the points whose distances to the existing initial points are longer than a threshold as the initial cluster centers,and then it appropriately adjusts the threshold according to the number of the points meeting the condition in the first step.The experimental results on UCI machine learning data sets indicate that it yields better stability compared with the typical K-means algorithm.

Key words: K-Means, clustering, variable threshold, initial cluster center