计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (6): 199-206.DOI: 10.3778/j.issn.1002-8331.2211-0119

• 模式识别与人工智能 • 上一篇    下一篇

结合实体边界线索的中文命名实体识别方法

黄蓉,陈艳平,扈应,黄瑞章,秦永彬   

  1. 1.贵州大学 公共大数据国家重点实验室,贵阳 550025
    2.贵州大学 计算机科学与技术学院,贵阳 550025
  • 出版日期:2024-03-15 发布日期:2024-03-15

Chinese Named Entity Recognition Methods Combined with Entity Boundary Cues

HUANG Rong, CHEN Yanping, HU Ying, HUANG Ruizhang, QIN Yongbin   

  1. 1.State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
    2.College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
  • Online:2024-03-15 Published:2024-03-15

摘要: 命名实体识别作为信息抽取领域的一个基础任务,能为机器翻译、关系抽取等下游任务提供有效支撑,具有重要的研究意义。针对中文命名实体识别方法中存在的实体边界模糊的问题,提出了一种结合实体边界线索的命名实体识别模型,模型由边界检测、线索生成、实体分类三个模块组成。利用边界检测模块识别实体边界。在线索生成模块中依据边界信息生成实体跨度,得到带边界线索标签的文本序列,使模型通过边界线索标签感知句子中的实体边界,学习实体边界和上下文的语义依赖特征。将带有边界线索标签的文本序列作为实体分类模块的输入,使用双仿射机制增强标签之间的语义交互,并结合双仿射机制与多层感知机的共同预测作为实体识别的结果。该模型在ACE2005中文数据集和Weibo数据集上的F1值分别达到了90.47%和73.54%,验证了模型对中文命名实体识别的有效性。

关键词: 命名实体识别, 嵌套命名实体识别, 线索标签, 边界检测

Abstract: As a basic task in information extraction, named entity recognition (NER)  can provide effective support for machine translation, relation extraction and other downstream tasks, and is of great research significance. To tackle the problem of fuzzy entity boundary in Chinese named entity recognition methods, a named entity recognition model combining entity boundary cue is proposed. The model is composed of three modules:boundary detection, cue generation and entity classification. Firstly, the entity boundary detection module is used to identify the entity boundary. Then, the entity span is generated according to the boundary information in the cue generation module, and the text sequence with the boundary cue label is obtained. Through the boundary cue label, the model can perceive the entity boundary in the sentence, and learn the semantic dependence characteristics of the entity boundary and context. Finally, the text sequence with boundary cue tags is employed as the input of entity classification module, and the semantic interaction between tags is enhanced by the Biaffine mechanism, then combined with the joint prediction of multilayer perceptron and Biaffine mechanism as the result of entity recognition. The F1 values of this model on ACE2005 Chinese dataset and Weibo dataset reaches 90.47% and 73.54% respectively, which verifies the effectiveness of the model for Chinese named entity recognition.

Key words: Chinese named entity identification, nested named entity recognition, cue tags, boundary detection