Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (17): 150-152.DOI: 10.3778/j.issn.1002-8331.2010.17.042

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Improved k-means initial clustering center selection algorithm

HAN Ling-bo1,WANG Qiang2,JIANG Zheng-feng2,HAO Zhi-qiang2   

  1. 1.Department of Theory and Information,Zhanjiang Party Institute,Zhanjiang,Guangdong 524032,China
    2.College of Computer Science and Information Technology,Guangxi Normal University,Guilin,Guangxi 541004,China
  • Received:2008-11-28 Revised:2009-02-27 Online:2010-06-11 Published:2010-06-11
  • Contact: HAN Ling-bo

一种改进的k-means初始聚类中心选取算法

韩凌波1,王 强2,蒋正锋2,郝志强2   

  1. 1.中共湛江市委党校 理论信息室,广东 湛江 524032
    2.广西师范大学 计算机科学与信息工程学院,广西 桂林 541004
  • 通讯作者: 韩凌波

Abstract: The traditional k-means has sensitivity to the initial clustering center.Considering this defection,a new improved algorithm is proposed.In the new algorithm,the density parameter of every data object is computed,and then k data objects with high density parameter are chosen as the initial clustering centers.Given the cluster number,and UCI database is used as testing datasets.The clustering results demonstrate that the improved algorithm can enhance the clustering stability and accuracy of ordinary k-means algorithm relatively.

Key words: k-means algorithm, clustering center, density parameter

摘要: 在传统的k-means聚类算法中,聚类结果会随着初始聚类中心点的不同而波动,针对这个缺点,提出一种优化初始聚类中心的算法。该算法通过计算每个数据对象的密度参数,然后选取k个处于高密度分布的点作为初始聚类中心。实验表明,在聚类类别数给定的情况下,通过用标准的UCI数据库进行实验比较,发现采用改进后方法选取的初始类中心的k-means算法比随机选取初始聚类中心算法有相对较高的准确率和稳定性。

关键词: k-means算法, 聚类中心, 密度参数

CLC Number: