Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (34): 144-146.DOI: 10.3778/j.issn.1002-8331.2009.34.044
• 数据库、信号与信息处理 • Previous Articles Next Articles
LIU Jin-ling
Received:
Revised:
Online:
Published:
Contact:
刘金岭
通讯作者:
Abstract: The calculating way proposed in this paper is to calculate the likeness degree of Chinese message and a message is gottern which embedded in the semantic space by using the Isomap method.This paper analyzes the messages according to the different clustering types in low-dimensional embedding.This algorithm has overcome difficulties in analyzing messages of traditional clustering types on different layers,and it has also overcome weakness of word frequency statistics which can not gather the similar meaning messages together.Experimental result indicates the algorithm is effective.
Key words: short messages clustering, Isomap algorithm, semantic space
摘要: 给出的算法思想是首先计算出中文短信的相似度,再通过使用Isomap方法得到短信在语义空间中的嵌入情况,然后将短信在低维嵌入上进行聚类分析。该算法克服了短信的传统聚类分析在表示层次上遇到的困难,也克服了词频统计法不能将内容意思相似的短信聚集在一起的缺点,实验表明该算法是行之有效的。
关键词: 短信聚类, Isomap算法, 语义空间
CLC Number:
TP311
LIU Jin-ling. Chinese short messages text clustering algorithm based on Isomap[J]. Computer Engineering and Applications, 2009, 45(34): 144-146.
刘金岭. 基于Isomap的中文短信文本聚类算法[J]. 计算机工程与应用, 2009, 45(34): 144-146.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2009.34.044
http://cea.ceaj.org/EN/Y2009/V45/I34/144