Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (11): 141-147.

Previous Articles     Next Articles

Automatic entity identification based on CRF and multilevel algorithm model

LIU Yin1, LV Xueqiang1, LIU Kun2   

  1. 1.Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
    2.Beijing TRS Information Technology Co., Ltd., Beijing 100101, China
  • Online:2016-06-01 Published:2016-06-14

条件随机场与多层算法模型的实体自动识别

刘  殷1,吕学强1,刘  坤2   

  1. 1.北京信息科技大学 网络文化与数字传播北京市重点实验室,北京 100101
    2.北京拓尔思信息技术股份有限公司,北京 100101

Abstract: Automatic entity identification technology is a powerful means to get information, and also is one of the key technologies in NLP field. Most of the current researches are named entity identification, and the researches are nearly mature, but the research of other kinds of entity like nominal and pronominal entity mentions is little. A method to identify the named and nominal entity mentions automatically is proposed. An approach for a new means, which using probability features inside the Chinese character, segmentation, and POS tagging information about it into CRF, then multilevel algorithm model to improve the results and recall which is not identified, is revealed to identify the entity in the corpus. Evaluated experiments on ACE standard corpus are proposed that the accuracy is 75.56%, and the recall is 72.52%. The results prove that the method is effective in entity identification problem.

Key words: entity identification, conditional random field, segmentation, multilevel algorithm model

摘要: 实体自动识别技术是人们获取信息的有力手段,也是自然语言处理研究的关键技术之一。目前命名实体识别的研究较多,且已趋于成熟,而对汉语文本中的其他实体(名词性、代词性)研究较少。因此提出了一体化识别命名实体识别和名词性实体的方法,该方法将实体的汉字、分词、词性标注等信息引入条件随机场;再利用多层算法模型优化已经识别出的实体,以及召回未识别出的实体。在标准ACE语料库上进行实验,正确率达到75.56%,召回率达到72.52%。结果表明该方法对于实体识别问题是有效的。

关键词: 实体识别, 条件随机场, 分词, 多层算法模型