计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (7): 132-134.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于词汇链的中文短信主题语句抽取方法

刘金岭,冯万利,张永军   

  1. 淮阴工学院 计算机工程学院,江苏 淮安 223003
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2012-03-01 发布日期:2012-03-01

Research of theme statement extraction for Chinese short message text based on lexical chain

LIU Jinling, FENG Wanli, ZHANG Yongjun   

  1. School of Computer Engineering, Huaiyin Institute of Technology, Huai’an, Jiangsu 223003, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-03-01 Published:2012-03-01

摘要: 提出一种基于词汇链的中文短信文本主题的抽取方法。该方法首先通过构造多条词汇链来表达短信文本的叙事线索,并从多条词汇链中抽取出富含主题信息的词汇链,将其作为构造短信文本主题语句的关键词序列。实验表明该方法抽取的短信文本主题能够更全面地覆盖短信文本的信息,并能消除多个关键词序列表达同一主题信息的冗余。其效果明显优于采用统计信息进行短信文本主题抽取的方法。

Abstract: An algorithm for Chinese SMS text topic extraction based on lexical chain is proposed. By constructing lexical chains for each SMS text, the article’s multiple depiction clews can be reflected, and some strong lexical chains with high quality can be extracted to represent main content of this article, and as the subject phrase SMS text structure keywords sequence. Experiments demonstrate that SMS text topic from this algorithm can cover SMS text information more completely. This algorithm can remove redundancy that different keyword sequence reflect same meanings. This method outperforms the method which uses statistics to perform extraction.