计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (21): 66-82.DOI: 10.3778/j.issn.1002-8331.2302-0381

• 热点与综述 • 上一篇    下一篇

地名实体识别研究与展望

王文涛,奚雪峰,崔志明,徐川   

  1. 1.苏州科技大学 电子与信息工程学院,江苏 苏州 215000
    2.苏州市虚拟现实智能交互及应用技术重点实验室,江苏 苏州 215000
    3.苏州智慧城市研究院,江苏 苏州 215000
    4.昆山市社会治理现代化综合指挥中心,江苏 昆山 215300
  • 出版日期:2023-11-01 发布日期:2023-11-01

Research and Prospect of Toponym Entity Recognition

WANG Wentao, XI Xuefeng, CUI Zhiming, XU Chuan   

  1. 1.School of Electronic & Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215000, China
    2.Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology, Suzhou, Jiangsu 215000, China
    3.Suzhou University of Science and Technology Smart City Research Institute, Suzhou, Jiangsu 215000, China
    4.Kunshan Social Governance Modernization Comprehensive Command Center, Kunshan, Jiangsu 215300, China
  • Online:2023-11-01 Published:2023-11-01

摘要: 地名作为一种常见的命名实体,广泛存在于非结构化文本中。是非结构化数据转为结构化过程中重要的关联实体。为了全面了解地名识别的最新研究成果和现状,概述了地名识别现有的应用场景、地名识别技术在具体场景的详细应用以及地名识别数据集和评价指标。总结分析了现有的地名识别方法:基于规则和地名词典匹配的方法、基于统计机器学习的方法、基于深度学习模型和混合模型方法。归纳总结了每一种地名识别方法的关键思路、优缺点和具体模型。同时对混合方法的融合特征和模型特点进行了总结归纳。并从模型性能展开比对分析,以及对词嵌入模型和预训练模型的模型特点进行了总结归纳。对地名实体识别研究方向进行总结和展望。

关键词: 命名实体识别, 地名实体识别, 自然语言处理, 深度学习, 信息抽取

Abstract: As a common naming entity, place names are widely used in unstructured texts. It is an important associated entity in the process of transforming unstructured data into structured data. In order to fully understand the latest research results and status quo of place name recognition. Firstly, the existing application scenarios of place name recognition, detailed application of place name recognition technology in specific scenarios, data sets and evaluation indicators of place name recognition are summarized. Then it summarizes and analyzes the existing methods of place name recognition. The method based on rules and place name dictionary matching, the method based on statistical machine learning, the method based on deep learning model and the mixed model method. The key ideas, advantages and disadvantages and specific models of each method are summarized. At the same time, the fusion characteristics and model characteristics of the hybrid method are summarized. The model performance is compared and analyzed. The characteristics of the word embedding model and the pre-training model are summarized. Finally, the research direction of place name entity recognition is summarized and prospected.

Key words: named entity recognition, toponym entity recognition, natural language processing, deep learning, information extraction