计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (19): 135-139.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于词汇链的中文变异垃圾短信文本语义识别

刘金岭,冯万利,高  丽   

  1. 淮阴工学院 计算机工程学院,江苏 淮安 223003
  • 出版日期:2012-07-01 发布日期:2012-06-27

Semantic recognition of altered Chinese junk short messages based on lexical chain

LIU Jinling, FENG Wanli, GAO Li   

  1. Computer Engineering Faculty, Huaiyin Institute of Technology, Huai’an, Jiangsu 223003, China
  • Online:2012-07-01 Published:2012-06-27

摘要: 提出一种基于词汇链的判断变异垃圾短信方法。该方法通过构造多条词汇链来表达短信文本的叙事线索,再从多条词汇链中抽取出富含内容信息的词汇链,同时消除了多个关键词序列表达同一内容信息的冗余;将构造的词汇链作为短信文本的信息相互进行比较,以对变异的垃圾短信信息进行识别。实验结果表明,该方法能较准确地识别垃圾短信的变异信息。

关键词: 词汇链, 垃圾短信, 变异

Abstract: An algorithm for recognition of altered Chinese junk short message based on lexical chain is proposed. By constructing lexical chains for each short message text, the multiple depiction clews can be conveyed, and some strong lexical chains with high quality can be extracted to represent main content, and can remove redundancy that different keyword sequence reflects same meanings. It compares the lexical chains constructed to determine the variation of junk short messages. Experimental results show that this approach can identify the variation of junk short messages more accurately.

Key words: lexical chain, junk short messages, rariation