Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (28): 137-139.DOI: 10.3778/j.issn.1002-8331.2009.28.041

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Semi-supervised improved K-means clustering algorithm

WANG Jun1,2,WANG Chuan-yu2,ZHOU Ming-zheng1   

  1. 1.Department of Computer Science & Engineering,Anhui University of Technology and Science,Wuhu,Anhui 241000,China
    2.Department of Math & Physics,Anhui University of Technology and Science,Wuhu,Anhui 241000,China
  • Received:2009-04-08 Revised:2009-06-11 Online:2009-10-01 Published:2009-10-01
  • Contact: WANG Jun

半监督的改进K-均值聚类算法

汪 军1,2,王传玉2,周鸣争1   

  1. 1.安徽工程科技学院 计算机科学与工程系,安徽 芜湖 241000
    2.安徽工程科技学院 应用数理系,安徽 芜湖 241000
  • 通讯作者: 汪 军

Abstract: K-means clustering algorithm acquires the number of clusters in advance.The random selection of the initial cluster centers will result in the instability and K-means clustering algorithm will be terminated in access to a local optimum value.In order to solve the problem,the improved K-means clustering algorithm based on semi-supervised learning theory obtains the number of clustering and initial clustering centers after building minimum spanning tree used by few label samples and splitting it iteratively.Although minimum spanning tree making up of random samples and initial clustering centers are different,the clustering is consistent and stable;the iteration is less than traditional K-means algorithm.It proves that the semi-supervised improved K-means algorithm is effective.

Key words: semi-supervised learning, K-means clustering, labeled sample, minimum spanning tree

摘要: K-均值聚类算法必须事先获取聚类数目,并且随机地选取聚类初始中心会造成聚类结果不稳定,容易在获得一个局部最优值时终止。提出了一种基于半监督学习理论的改进K-均值聚类算法,利用少量标签数据建立图的最小生成树并迭代分裂获取K-均值聚类算法所需要的聚类数和初始聚类中心。在IRIS数据集上的实验表明,尽管随机样本构造的生成树不同,聚类中心也不同,但聚类是一致且稳定的,迭代的次数较少,验证了该文算法的有效性。

关键词: 半监督学习, K-均值聚类, 标签样本, 最小生成树

CLC Number: