计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (4): 227-228.DOI: 10.3778/j.issn.1002-8331.2009.04.066

• 工程与应用 • 上一篇    下一篇

基于最大熵模型的中国人名自动识别

曹 波,苏一丹,邓 琦   

  1. 广西大学 计算机与电子信息学院,南宁 530004
  • 收稿日期:2008-01-04 修回日期:2008-03-27 出版日期:2009-02-01 发布日期:2009-02-01
  • 通讯作者: 曹 波

Automatic recognition of Chinese name based on maximum entropy

CAO Bo,SU Yi-dan,DENG Qi   

  1. School of Computer and Electronic Information,Guangxi University,Nanning 530004,China
  • Received:2008-01-04 Revised:2008-03-27 Online:2009-02-01 Published:2009-02-01
  • Contact: CAO Bo

摘要: 用最大熵模型自动识别中国人名。首先对语料库的词性进行角色替换,然后用特征模板从角色替换后的语料库中提取出特征集,接着用IIS算法训练特征集的最大熵参数,最后用viterbi算法对初分词文本进行角色标注,并在角色序列的基础上进行模式最大匹配,从而实现中国人名的自动识别。在封闭测试实验中,识别准确率、召回率、F-值分别达到了85.4%、91.2%、88.2%。

关键词: 中国人名识别, 最大熵模型, viterbi算法

Abstract: Authors use the maximum entropy model to recognize the Chinese name automatically.Firstly,authors replace the corpus’s poses with roles,then,use feature template to extract feature set from the corpus which poses have been replaced with roles,thirdly,train the parameters of the feature set using IIS algorithm,finally,use the viterbi algorithm to tag the text which has been roughly segmented.The possible names are recognized after maximum pattern matching on the roles sequence.The closed test shows that the precision,the recall and the F-measure reach 75.6%,91.4%,82.8%.

Key words: Chinese name recognition, maximum entropy model, viterbi algorithm