Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (32): 126-129.DOI: 10.3778/j.issn.1002-8331.2009.32.040

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Research on automatic military intelligence term extraction using CRF model

JIA Mei-ying1,3,YANG Bing-ru1,ZHENG De-quan2,3,YANG Jing2   

  1. 1.School of Information Engineering,University of Science and Technology Beijing,Beijing 100083,China
    2.MOE-MS Key Laboratory of Natural Language Processing and Speech,Harbin Institute of Technology,Harbin 150001,China
    3.Beijing Graphic Institution,Beijing 100029,China
  • Received:2008-06-24 Revised:2009-04-17 Online:2009-11-11 Published:2009-11-11
  • Contact: JIA Mei-ying

采用CRF技术的军事情报术语自动抽取研究

贾美英1,3,杨炳儒1,郑德权2,3,杨 靖2   

  1. 1.北京科技大学 信息工程学院,北京 100083
    2.哈尔滨工业大学 教育部-微软语言语音重点实验室,哈尔滨 150001
    3.北京图形研究所,北京 100029
  • 通讯作者: 贾美英

Abstract: This paper introduces a Conditional Random Fields(CRF) based method for term extraction,which intends to be used in military intelligent process.This method takes the field term extraction as an issue of sequence marking,quantitates the characters of field term distribution and takes it as the training characters,leverages the CRF toolkit to generate a field term character template and uses the template for field term extraction.In the experiment,the materials for training are the news data from the military channel of Sohu Networks,the materials for testing are all of the articles from magazine of Modern Military 2007,issues 1 to 8.The experimental result is positive with precision rate of 73.24%,recall rate of 69.75%,and F-measure of 71.36%.It turns out that this method is simple and feasible,and can be used on other fields.

Key words: term extraction, Conditional Random fields(CRF), template

摘要: 针对军事情报领域,提出了一种基于条件随机场的术语抽取方法,该方法将领域术语抽取看作一个序列标注问题,将领域术语分布的特征量化作为训练的特征,利用CRF工具包训练出一个领域术语特征模板,然后利用该模板进行领域术语抽取。实验采用的训练语料来自“搜狐网络军事频道”的新闻数据,测试语料选取《现代军事》杂志2007年第1~8期的所有文章。实验取得了良好的结果,准确率为73.24%,召回率为69.57%,F-测度为71.36%,表明该方法简单易行,且具有领域通用性。

关键词: 术语抽取, 条件随机场, 模板

CLC Number: