计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (34): 144-146.DOI: 10.3778/j.issn.1002-8331.2009.34.044

• 数据库、信号与信息处理 • 上一篇    下一篇

基于Isomap的中文短信文本聚类算法

刘金岭   

  1. 淮阴工学院,江苏 淮安 223001
  • 收稿日期:2008-08-22 修回日期:2008-11-10 出版日期:2009-12-01 发布日期:2009-12-01
  • 通讯作者: 刘金岭

Chinese short messages text clustering algorithm based on Isomap

LIU Jin-ling   

  1. Huaiyin Institute of Technology,Huai’an,Jiangsu 223001,China
  • Received:2008-08-22 Revised:2008-11-10 Online:2009-12-01 Published:2009-12-01
  • Contact: LIU Jin-ling

摘要: 给出的算法思想是首先计算出中文短信的相似度,再通过使用Isomap方法得到短信在语义空间中的嵌入情况,然后将短信在低维嵌入上进行聚类分析。该算法克服了短信的传统聚类分析在表示层次上遇到的困难,也克服了词频统计法不能将内容意思相似的短信聚集在一起的缺点,实验表明该算法是行之有效的。

关键词: 短信聚类, Isomap算法, 语义空间

Abstract: The calculating way proposed in this paper is to calculate the likeness degree of Chinese message and a message is gottern which embedded in the semantic space by using the Isomap method.This paper analyzes the messages according to the different clustering types in low-dimensional embedding.This algorithm has overcome difficulties in analyzing messages of traditional clustering types on different layers,and it has also overcome weakness of word frequency statistics which can not gather the similar meaning messages together.Experimental result indicates the algorithm is effective.

Key words: short messages clustering, Isomap algorithm, semantic space

中图分类号: