计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (1): 15-27.DOI: 10.3778/j.issn.1002-8331.2304-0398

• 热点与综述 • 上一篇    下一篇

中文命名实体识别研究综述

赵继贵,钱育蓉,王魁,侯树祥,陈嘉颖   

  1. 1.新疆大学 软件学院,乌鲁木齐 830000
    2.新疆大学 新疆维吾尔自治区信号检测与处理重点实验室,乌鲁木齐 830046
    3.新疆大学 软件工程重点实验室,乌鲁木齐 830000
    4.中国科学院大学 经济与管理学院,北京 101408
    5.新疆大学 信息科学与工程学院,乌鲁木齐 830000
  • 出版日期:2024-01-01 发布日期:2024-01-01

Survey of Chinese Named Entity Recognition Research

ZHAO Jigui, QIAN Yurong, WANG Kui, HOU Shuxiang, CHEN Jiaying   

  1. 1.School of Software, Xinjiang University, Urumqi 830000, China
    2.Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Urumqi 830046, China
    3.Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830000, China
    4.School of Economics and Management, University of the Chinese Academy of Sciences, Beijing 101408, China
    5.School of Information Science and Engineering, Xinjiang University, Urumqi 830000, China
  • Online:2024-01-01 Published:2024-01-01

摘要: 命名实体识别(named entity recognition,NER)是自然语言处理中最基本的任务之一,其主要内容是识别自然语言文本中具有特定意义的实体类型和边界。然而,中文命名实体识别(Chinese named entity recognition,CNER)的数据样本存在词边界模糊、语义多样化、形态特征模糊以及中文语料库内容较少等问题,导致中文命名实体识别性能难以大幅提升。介绍了CNER的数据集、标注方案和评价指标。按照CNER的研究进程,将CNER方法分为基于规则的方法、基于统计的方法和基于深度学习的方法三类,并对近五年来基于深度学习的CNER主要模型进行总结。探讨CNER的研究趋势,为新方法的提出和未来研究方向提供一定参考。

关键词: 自然语言处理, 中文命名实体识别, 深度学习, 预训练模型, 机器学习

Abstract: Named entity recognition (NER) is one of the most fundamental tasks in natural language processing, and its main content is to identify the entity types and boundaries with specific meanings in natural language text. However, the data samples of Chinese named entity recognition (CNER) have problems such as blurred word boundaries, semantic diversity, blurred morphological features and small Chinese corpus content, which make it difficult to improve the performance of Chinese NER. In this paper, firstly, the dataset, annotation scheme and evaluation index of CNER are introduced. Secondly, according to the research process of CNER, CNER methods are classified into three categories: rule-based methods, statistical-based methods and deep learning-based methods, and the main models of CNER based on deep learning in the past five years are summarized. Finally, the research trends of CNER are discussed to provide some reference for the proposal of new methods and future research directions.

Key words: natural language processing, Chinese named entity recognition (CNER), deep learning , pre-training models, machine learning