计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (28): 230-232.DOI: 10.3778/j.issn.1002-8331.2009.28.069

• 工程与应用 • 上一篇    下一篇

利用地名用字分析的中文地名识别处理

李 诺1,2,张 全2   

  1. 1.中国科学院 研究生院,北京 100039
    2.中国科学院 声学研究所,北京 100190
  • 收稿日期:2008-06-02 修回日期:2008-09-12 出版日期:2009-10-01 发布日期:2009-10-01
  • 通讯作者: 李 诺

Chinese place name identification with Chinese characters features

LI Nuo1,2,ZHANG Quan2   

  1. 1.Graduate University of Chinese Academy of Sciences,Beijing 100039,China
    2.Institute of Acoustics,Chinese Academy of Sciences,Beijing 100190,China
  • Received:2008-06-02 Revised:2008-09-12 Online:2009-10-01 Published:2009-10-01
  • Contact: LI Nuo

摘要: 对中文地名未登录词识别而言,首先充分挖掘地名用字本身的特征,及其上下文用字的特征,其次通过最大熵模型把这些来源不同的知识整合。在特征选择和知识获取时,通过对中文地名这个特定群体进行针对性分析,得到了更多的信息,如中文地名更常用哪些字以及这些字如何搭配更常见等。最终使得系统在真实语料的封闭测试和开放测试中分别达到了F值87%和83%的较好效果。

关键词: 中文地名识别, 地名用字分析, 最大熵

Abstract: This paper extracts the features from the Chinese place names and their context firstly,and then aggregates differernt features from different sources.Before setting feature functions,more information has been received by analyzing the Chinese characters features.This paper focuses on characters which are used frequently and how these characters matching with each other.Finally,it achieves an acceptable result by open test on real corpus.

Key words: placename recognition, analysis of placename, maximum entropy

中图分类号: