Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (14): 170-172.

• 数据库与信息处理 • Previous Articles     Next Articles

Automatic Extraction of Keyphrases from Scientific Articles based on machine learning method

JiaBin Liu   

  • Received:2006-06-06 Revised:1900-01-01 Online:2007-05-10 Published:2007-05-10
  • Contact: JiaBin Liu

基于机器学习的科技文摘关键词自动提取方法

刘佳宾 陈超 正荣 吉翔华   

  1. 中国科学技术大学电子工程与信息科学系 空军工程大学 工程学院
  • 通讯作者: 刘佳宾

Abstract: In order to realize automatic keyphrases extraction from scientific articles. This paper proposes a method that utilize a supervised machine learning method. In order to define the potential terms, This paper combine the n_grams method and part of speech (POS) method. We consider four features to represent terms, including term frequency, relative position of the first occurrence, relative position of the sentence and the number of tokens in a term. Experimental results show that this method performs perfect and is a general method to any field.

Key words: information retrieval, decision tree, part of speech, n_grams method

摘要: 本文提出了一种基于机器学习的关键词自动抽取技术,主要是针对数字图书馆中的学术论文的摘要(Abstract)进行抽取。首次提出了以句子为基本抽取单位进行关键词抽取的思想。在提出关键词的候选词时采用n_grams方法和词性相结合的方法,在选取特征时考虑了词组的出现频率、词组在整个摘要中的位置、在所在句子中的位置和词组中单词的个数等特征。实验结果表明该方法能够适应各个领域的论文关键词提取,并且可以得到很好的效果。

关键词: 信息自动抽取, 决策树, 词性分析, n_grams方法