Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (23): 166-168.DOI: 10.3778/j.issn.1002-8331.2008.23.051

• 数据库、信号与信息处理 • Previous Articles     Next Articles

New genetic K-means clustering algorithm based on meliorated initial center

SUN Xiu-juan1,LIU Xi-yu2   

  1. 1.School of Information Science and Engineering,Shandong Normal University,Jinan 250014,China
    2.School of Management,Shandong Normal University,Jinan 250014,China
  • Received:2007-10-15 Revised:2008-01-21 Online:2008-08-11 Published:2008-08-11
  • Contact: SUN Xiu-juan

基于初始中心优化的遗传K-means聚类新算法

孙秀娟1,刘希玉2   

  1. 1.山东师范大学 信息科学与工程学院,济南 250014
    2.山东师范大学 管理学院,济南 250014
  • 通讯作者: 孙秀娟

Abstract: A good K-means clustering algorithm should meet two requirements at least.First,it can reflect the validity of clustering,in other words,clustering number consistents with the practical problems.Second,it has the ability to handle the noise.The traditional K-means algorithm is a local search algorithm,which is sensitive to initialization and easy to search a local maximum.To address this shortcoming,a new K-means algorithm is proposed to optimize the initial center.The algorithm finds k data objects,all of which are belong to high density area and the most far away to each other.Experiments show that the algorithm has not only the weak dependence on initial data,but also fast convergence and high clustering quality.To realize the validity of clustering and get clustering results of higher accuracy,the paper proposes a hybrid algorithm,which combines the optimal K-means algorithm and the genetic algorithm.The algorithm can automatically get the optimal value of k with high compact clusters and large separation between at least two clusters,and optimal k initial center in order to get better clustering,then continue to search iteratively to get the optimal solution.Experiments show that the hybrid method has better clustering quality and general performance.

Key words: clustering, K-means algorithm, genetic algorithm

摘要: 一个好的K-means聚类算法至少要满足两个要求:(1)能反映聚类的有效性,即所分类别数要与实际问题相符;(2)具有处理噪声数据的能力。传统的K-means算法是一种局部搜索算法,存在着对初始化敏感和容易陷入局部极值的缺点。针对此缺点,提出了一种优化初始中心的K-means算法,该算法选择相距最远的处于高密度区域的k个数据对象作为初始聚类中心。实验表明该算法不仅具有对初始数据的弱依赖性,而且具有收敛快,聚类质量高的特点。为体现聚类的有效性,获得更高精度的聚类结果,提出了将优化的K-means算法(PKM)和遗传算法相结合的混合算法(PGKM),该算法在提高紧凑度(类内距)和分离度(类间距)的同时自动搜索最佳聚类数k,对k个初始中心优化后再聚类,不断地循环迭代,得到满足终止条件的最优聚类。实验证明该算法具有更好的聚类质量和综合性能。

关键词: 聚类, K-means算法, 遗传算法