计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (20): 154-157.DOI: 10.3778/j.issn.1002-8331.2009.20.046

• 数据库、信息处理 • 上一篇    下一篇

基于Seed集的半监督核聚类

李昆仑,张 超,曹 铮,刘 明   

  1. 河北大学 电子信息工程学院,河北 保定 071002
  • 收稿日期:2009-01-09 修回日期:2009-04-01 出版日期:2009-07-11 发布日期:2009-07-11
  • 通讯作者: 李昆仑

Semi-supervised kernel clustering algorithm based on seed set

LI Kun-lun,ZHANG Chao,CAO Zheng,LIU Ming   

  1. College of Electronic and Information Engineering,Hebei University,Baoding,Hebei 071002,China
  • Received:2009-01-09 Revised:2009-04-01 Online:2009-07-11 Published:2009-07-11
  • Contact: LI Kun-lun

摘要: 提出了一种新的半监督核聚类算法——SKK-均值算法。算法利用一定数量的标记样本构成seed集,作为监督信息来初始化K-均值算法的聚类中心,引导聚类过程并约束数据划分;同时还采用了核方法把输入数据映射到高维特征空间,并用核函数来实现样本之间的距离计算。在UCI数据集上进行了数值实验,并与K-均值算法和核-K-均值算法进行了比较。

关键词: 半监督聚类, seed 集, 核方法, K-均值

Abstract: This paper presents a novel semi-supervised kernel clustering algorithm called Seed Kernel K-Means(SKK-Means) algorithm.It uses labeled data to generate initial seed clusters to guide the clustering process and data partition,and uses kernel method to map the input data into a high-dimensional feature space and calculates the distance between data points with a kernel function.The algorithm is compared with the other clustering algorithms such as K-Means and Kernel K-Means,on UCI databases in some numeric experiment.

Key words: semi-supervised clustering, seed set, kernel method, K-means