计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (22): 100-102.DOI: 10.3778/j.issn.1002-8331.2009.22.033

• 数据库、信息处理 • 上一篇    下一篇

有约束的半监督聚类方法

刘应东   

  1. 兰州交通大学 交通运输学院,兰州 730070
  • 收稿日期:2008-10-30 修回日期:2009-01-14 出版日期:2009-08-01 发布日期:2009-08-01
  • 通讯作者: 刘应东

Semi-supervised clustering method with constrains

LIU Ying-dong   

  1. School of Traffic and Transportation,Lanzhou Jiaotong University,Lanzhou 730070,China
  • Received:2008-10-30 Revised:2009-01-14 Online:2009-08-01 Published:2009-08-01
  • Contact: LIU Ying-dong

摘要: 在数据挖掘领域的很多实际应用中,获取大量的无标签样本非常容易,而获取有标签的样本通常需要付出较大的代价,并且有时不可能得到所有的数据的标签,半监督聚类就是使用一小部分的标签数据对无标签数据的聚类过程进行指导。提出了一种新的半监督聚类算法,它利用标签数据提供的信息来初步确定数据的相似性和不相似性标准,并在聚类过程中对其进行自动调整,利用它们对聚类过程进行约束和指导。通过在标准数据集高斯数据集上的测试,该算法相对于无指导聚类来说有更高的精度和更快的速度。

关键词: 数据挖掘, 标签数据, 约束, 半监督聚类

Abstract: In many data mining domains,there is a large supply of unlabeled data but limited labeled data,which can be expensive to generate.Consequently,semi-supervised clustering,which uses a small amount of labeled data to aid unlabeled clustering,has become a topic of significant recent interest.This paper presents a new algorithm,called semi-supervised clustering algorithm based on constrains learning,which obtains the similarity and dissimilarity criterions of data objects,adjusts them in the process of clustering,and uses them to constrain and supervise clustering.Demonstrated the clustering algorithm with Gaussian dataset,and the experimental results confirm that the clustering algorithm significantly improves the accuracy and speed of clustering when given a relatively small amount of supervision.

Key words: data mining, labeled data, constrains, semi-supervised clustering