计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (24): 46-69.DOI: 10.3778/j.issn.1002-8331.2302-0361

• 热点与综述 • 上一篇    下一篇

深度学习中文命名实体识别研究进展

李莉,奚雪峰,盛胜利,崔志明,徐家保   

  1. 1.苏州科技大学 电子与信息工程学院,江苏 苏州 215000
    2.苏州市虚拟现实智能交互应用技术重点实验室,江苏 苏州 215000
    3.苏州科技大学 智慧城市研究院,江苏 苏州 215000
    4.德州理工大学,美国德克萨斯州 拉伯克市 79401
  • 出版日期:2023-12-15 发布日期:2023-12-15

Research Progress on Named Entity Recognition in Chinese Deep Learning

LI Li, XI Xuefeng, SHENG Shengli, CUI Zhiming, XU Jiabao   

  1. 1.School of Electronic & Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215000, China
    2.Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology, Suzhou, Jiangsu 215000, China
    3.Suzhou Smart City Research Institute, Suzhou University of Science and Technology, Suzhou, Jiangsu 215000, China
    4.Texas Institute of Technology, Lubbock, Texas 79401, USA
  • Online:2023-12-15 Published:2023-12-15

摘要: 中文命名实体识别(CNER)指识别中文文本中具有特定意义的实体,是自然语言处理诸多下游任务的重要基石。近年来,深度学习技术依托其端到端的方法,自动学习到更深层次和更抽象的数据特征,摆脱了人工标注的依赖,解决了高维特征空间的数据稀疏问题,从而逐渐成为中文命名实体识别方法的主流。回顾了命名实体识别的发展进程和CNER的特殊性和难点;围绕着中文命名实体识别的不同处理特点,将基于深度学习的中文命名实体识别的方法分类为扁平实体边界问题、中文嵌套命名实体识别和CNER小样本问题处理三个领域,并具体阐述这三类领域的模型、细分领域和最近的研究进展并整理了部分典型深度学习方法在相关数据集上的实验结果;再次总结了中文命名实体识别任务的常用数据集和评估方法;指出了当前中文命名实体识别技术面临的挑战和未来的研究方向。

关键词: 中文命名实体识别, 深度学习, 实体边界, 中文嵌套命名实体识别, 低资源中文命名实体识别

Abstract: Chinese named entity recognition(CNER) is the process of identifying and categorizing entities with specific meanings in Chinese text. It is a crucial component in many downstream tasks within natural language processing. In the past few years, deep learning technology has increasingly relied on end-to-end methods to automatically learn more complex and abstract data features, thereby reducing the need for manual annotation and addressing the issue of data sparsity in high-dimensional feature spaces. As a result, deep learning has emerged as the dominant approach for Chinese named entity recognition. This article initially provides an overview of the historical development of named entity recognition and outlines the specific challenges and intricacies associated with Chinese named entity recognition(CNER). It then delves into the distinct processing characteristics of CNER and categorizes deep learning-based methods for CNER into three key areas:flat entity boundary problem, Chinese nested named entity recognition, and CNER small sample problem. The paper offers a detailed description of the models, subdivisions, and recent research progress in each of these areas, and presents experimental results of several noteworthy deep learning methods on relevant datasets. Finally, the article identifies the challenges and future research directions for CNER, and concludes with a summary of commonly used datasets and evaluation methods for Chinese named entity recognition.

Key words: Chinese named entity recognition, deep learning, entity boundary, Chinese nested named entity recognition, low resource Chinese named entity recognition