Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (10): 147-149.

Previous Articles     Next Articles

Terminology recognition based on conditional random fields

SHI Shuicai1,2, WANG Kai1, HAN Yanhua1,2, LV Xueqiang1,2   

  1. 1.Chinese Information Processing Research Center, Beijing Information Science and Technology University, Beijing 100101, China
    2.Beijing TRS Information Technology Co., Ltd, Beijing 100101, China
  • Online:2013-05-15 Published:2013-05-14

基于条件随机场的领域术语识别研究

施水才1,2,王  锴1,韩艳铧1,2,吕学强1,2   

  1. 1.北京信息科技大学 中文信息处理研究中心,北京 100101
    2.北京拓尔思信息技术股份有限公司,北京 100101

Abstract: Terminology is the key word in all fields. This paper describes a method to recognize terminology based on researches on domain literature. Relying on the existing mature tools, this method uses CRF model to calculate the probability of POS combination. After choosing the set of features, it proposes an optimal feature template through adjusting features and window combination. Meanwhile, it uses 10-fold cross-validation method to determine training parameters of the model. The experimental result shows that the method proposed is a practical reference for terminology recognition.

Key words: terminology, Conditional Random Field(CRF), Part Of Speech(POS) combination, feature template

摘要: 领域术语是各个领域的核心词汇,在研究了大量领域文献的基础上,提出了一种识别领域术语的方法。该方法以现有成熟工具为依托,使用条件随机场模型统计领域术语的词性组合概率。在选定特征集后,通过调整特征和窗口的组合,制定一个最优特征模板,同时通过10倍交叉验证法确定模型训练参数。实验结果表明,通过条件随机场模型分析领域术语的词性组合概率能够有效地识别领域术语。

关键词: 领域术语, 条件随机场, 词性组合, 特征模板