计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (16): 34-48.DOI: 10.3778/j.issn.1002-8331.2312-0260

• 热点与综述 • 上一篇    下一篇

数据驱动的中文实体抽取方法综述

肖蕾,陈镇家   

  1. 广东技术师范大学 自动化学院,广州 510450
  • 出版日期:2024-08-15 发布日期:2024-08-15

Review of Data-Driven Approaches to Chinese Named Entity Recognition

XIAO Lei, CHEN Zhenjia   

  1. School of Automation, Guangdong Polytechnic Normal University, Guangzhou 510450, China
  • Online:2024-08-15 Published:2024-08-15

摘要: 中文实体抽取(Chinese named entity recognition,CNER)是中文信息抽取任务中的关键一步,是问答系统、机器翻译和知识图谱等下游任务的基础,其方法主要分为知识驱动和数据驱动两大类。然而基于规则、词典与机器学习的传统知识驱动方法存在忽视上下文语义信息、计算成本高和低召回率的问题,限制了CNER技术的发展。介绍了CNER的定义和发展历程。详细整理了CNER任务的典型数据集、训练工具、序列标注方式和模型评价指标。对基于数据驱动的方法进行了总结,将数据驱动的方法划分为基于深度学习、预训练语言模型和中文实体关系联合抽取等方法,并分析了数据驱动方法在不同领域的实际应用场景。对CNER任务的未来研究方向进行了展望,为新方法的提出提供一定参考。

关键词: 中文实体抽取, 数据驱动, 深度学习, 知识图谱

Abstract: Chinese named entity recognition (CNER) is a key step in Chinese information extraction task, which is the basis of downstream tasks such as question answering system, machine translation and knowledge mapping, and its methods are mainly categorized into two main types: knowledge-driven and data-driven. However, the traditional knowledge-driven methods based on rules, dictionaries and machine learning have the problems of ignoring contextual semantic information, high computational cost and low recall rate, which limit the development of CNER technology. Firstly, the definition and development history of CNER are introduced. Secondly, the typical datasets, training tools, sequence annotation methods and model evaluation indexes for CNER tasks are organized in detail. Thirdly, the data-driven methods are summarized and divided into methods based on deep learning, pre-trained language models and joint extraction of Chinese entity relations, and the practical application scenarios of data-driven methods in different fields are analyzed. Finally, the future research direction of CNER task is outlooked to provide some reference for the proposal of new methods.

Key words: Chinese named entity recognition, data-driven, deep learning, knowledge graph