计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (12): 162-163.

• 数据库与信息处理 • 上一篇    下一篇

一种基于《知网》的中文文本聚类算法的研究

赵鹏 蔡庆生   

  1. 安徽大学计算机学院 安徽工业大学 计算机科学系
  • 收稿日期:2006-05-22 修回日期:1900-01-01 出版日期:2007-04-20 发布日期:2007-04-20
  • 通讯作者: 赵鹏

Research of A Novel Chinese Text Clustering Algorithm Based on HowNet

ZHAO Peng1,2,CAI Qing-sheng2   

  1. (1.Key Lab. of Intelligent Computing & Signal Processing of Ministry of Education, Anhui University, Hefei 230039,China;2.Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China)
  • Received:2006-05-22 Revised:1900-01-01 Online:2007-04-20 Published:2007-04-20

摘要: 针对基于关键词集的中文文本聚类算法中存在的问题,本文将《知网》引入到中文文本的特征表示中,并在此基础上提出了一种基于《知网》的中文文本聚类算法。该算法在中文文本表示中加入了基于《知网》的概念特征,实验结果表明该算法能够更好地将语义相关的中文文档聚集在一起,与传统的基于关键词集的中文文本聚类算法相比,聚类质量得到了较大提高。

Abstract: To settle the problem of Chinese text clustering algorithm based on keywords set, this paper introduced HowNet into the representation of Chinese text representation and presented a Chinese text clustering algorithm based on HowNet. This algorithm added the conceptual characteristic based on Hownet to the representation of Chinese text. Experimental results show that this algorithm can cluster the semantic relative Chinese text into the same cluster better and improve the quality of text clustering greatly.