计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (35): 1-4.

• 博士论坛 • 上一篇    下一篇

基于最大熵模型和规则的中文姓名识别

贾 宁1,2,张 全2

  

  1. 1.中国科学院 研究生院,北京 100039
    2.中国科学院 声学研究所,北京 100080
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-12-11 发布日期:2007-12-11
  • 通讯作者: 贾 宁

Identification of Chinese names based on maximum entropy model and rules

JIA Ning1,2,ZHANG Quan2   

  1. 1.Graduate School of Chinese Academy of Sciences,Beijing 100039,China
    2.Institute of Acoustics,Chinese Academy of Sciences,Beijing 100080,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-12-11 Published:2007-12-11
  • Contact: JIA Ning

摘要: 中文姓名识别是中文信息处理的一项重要技术,识别的召回率对其它需要以姓名识别为基础的中文信息处理技术有至关重要的影响。提出了一种统计模型和处理规则相结合的中文姓名识别方法:首先以最大熵模型识别潜在姓氏,而后再通过判定规则作进一步处理。真实语料的开放测试表明,该方法在召回率方面有明显的优势,可以达到94%以上的召回率,同时能保证较高的准确率。

关键词: 中文姓名识别, 最大熵, 规则

Abstract: Identification of Chinese names is one of the important fields for the Chinese language automatic processing.The recall rate of identification will affect other processing deeply.But most methods can’t get a good recall rate which is up to 90%.This paper presents a method based on maximum entropy model and rules.The open test on real corpus shows that the recall rate of the system reaches 94%,with a precision more than 84%.The method is practicable,and benefits from its recall rate.

Key words: Chinese name recognition, maximum entropy, rule