计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (15): 129-133.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于聚类融合的标题文本聚类方法

杨  威,朱福喜   

  1. 武汉大学 计算机学院,武汉 430072
  • 出版日期:2015-08-01 发布日期:2015-08-14

Title text clustering method based on clustering ensemble

YANG Wei, ZHU Fuxi   

  1. Computer School of Wuhan University, Wuhan 430072, China
  • Online:2015-08-01 Published:2015-08-14

摘要: 针对标题文本聚类中的聚类结果不稳定问题,提出一种基于聚类融合的标题文本聚类方法。该方法对标题文本的特征词进行筛选,将标题文本转化为特征词集合;提出基于统计和语义的相似度计算方法,计算特征词集合间的相似度;引入基于共协矩阵的聚类融合算法,得出聚类结果。实验结果表明,和传统聚类算法相比,该方法提升了标题文本聚类的稳定性。

关键词: 标题文本, 聚类融合, 聚类稳定性

Abstract: For the title text clustering results instability problem, a title text clustering method based on clustering ensemble is proposed. It filters out the title texts’ feature words, and transforms title texts into feature sets. It proposes a similarity calculation method based on statistical and semantic to calculate the similarity between feature sets. It introduces the clustering ensemble algorithm based on co-association matrix to get results. Experimental results show that, compared with the traditional clustering algorithm, this method improves the stability of results.

Key words: title text, cluster ensemble, cluster stability