计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (20): 126-128.

• 数据库、信号与信息处理 • 上一篇    下一篇

高效的短文本主题词抽取方法

常 鹏1,马 辉2   

  1. 1.天津大学 管理学院,天津 300072
    2.天津城市建设学院 管理系,天津 300384
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-07-11 发布日期:2011-07-11

Efficient short texts keyword extraction method analysis

CHANG Peng1,MA Hui2   

  1. 1.Department of Management,Tianjin University,Tianjin 300072,China
    2.Department of Management,Tianjin Insititute of Urban Construction,Tianjin 300384,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-07-11 Published:2011-07-11

摘要: 为了克服传统主题词抽取算法中的主题漂移与主题误判等问题,提出了利用词的共现信息来提高主题词抽取的准确率。根据词汇与文本中的上下文环境词汇的共现关系来调节词的权重评分,与文本主题具有较高共现率的词将被优先抽取为文本的主题词,从而提高文本的主题词抽取精度。经实验证明,提出的主题词抽取方法较一般主题词抽取方法准确率有所提升,特别是抽取文本篇幅较短时,该方法明显优于一般方法。

关键词: 主题词抽取, 词共现, 主题抽取

Abstract: In order to overcome the shortcoming of traditional methods of subject extraction,such as the theme drifting and theme misjudging,a new keywords extraction algorithm based on co-occurrence analysis is proposed in this paper.The word’s weight is adjusted by its ability of associating with other words.The word that occurred with more words has greater impact and is extracted firstly.The experimental results show that the summarization generated by the improved algorithm gets better performance than other methods both in recall and precision.

Key words: keyword extraction, co-occurrence, subject extraction