Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (13): 129-131.DOI: 10.3778/j.issn.1002-8331.2010.13.038

• 数据库、信号与信息处理 • Previous Articles     Next Articles

CRFs-based approach to recognition of Chinese address element

JIANG Wen-ming,ZHANG Xue-ying,LI Bo-qiu   

  1. Key Lab of Virtual Geographical Environment,Ministry of Education,Nanjing Normal University,Nanjing 210046,China
  • Received:2010-01-06 Revised:2010-02-21 Online:2010-05-01 Published:2010-05-01
  • Contact: JIANG Wen-ming

基于条件随机场的中文地址要素识别方法

蒋文明,张雪英,李伯秋   

  1. 南京师范大学 虚拟地理环境教育部重点实验室,南京 210046
  • 通讯作者: 蒋文明

Abstract: Because of the nonstandard named Chinese address and description character of Chinese language,recognition of Chinese address elements has been regarded as key issues of Chinese geocoding.It is difficult to resolve the problem of address name diversity by traditional method of character words matching and dictionary or gazetteer matching.Chinese address recognition method on the basis of CRFs is designed by constructing address annotation set using NLP technology.The experiment proves that CRFs based method is better than character based rule method in recognition result.As CRFs model has good generalization ability,this method has greater generality that especially fits for large-scale batch parsing and quick geocoding in LBS.

Key words: geocoding, Chinese address element, natural language processing, conditional random fields

摘要: 由于中文地址命名的不规范性和汉语语言特点,中文地址要素识别成为地址编码的关键技术。传统的特征字匹配和字典匹配方法,难以解决地址要素命名的多样性问题。借鉴自然语言处理技术,通过构建地址要素标注集,设计了基于条件随机场的中文地址要素识别方法。实验证明,与基于特征字的规则方法相比,基于条件随机场的方法能够在较大程度上提高识别效果。由于条件随机场模型具有较好的泛化能力,该方法具有更强的通用性,特别适宜于大规模地址数据的批量解析和大众化位置服务中地址编码的快速处理。

关键词: 地址编码, 中文地址要素, 自然语言处理, 条件随机场

CLC Number: