Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (9): 130-139.DOI: 10.3778/j.issn.1002-8331.2112-0418

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Joint Extraction of Entities and Relations Model for Single-Step Span-Labeling

ZHENG Zhaoqian, HAN Dongchen, ZHAO Hui   

  1. School of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China
  • Online:2023-05-01 Published:2023-05-01

单步片段标注的实体关系联合抽取模型

郑肇谦,韩东辰,赵辉   

  1. 长春工业大学 计算机科学与工程学院,长春 130012

Abstract: As an upstream task in many fields such as knowledge graph, relation extraction has a wide range of application value and has received extensive attention in recent years. At present, the problem of exposure bias is common in relation extraction models, and the problems of entity nesting and entity overlapping are common in extracted text, which seriously affect the performance of the model. Therefore, this paper proposes an entity-relationship extraction model(span-labeling based model, SLM) based on Span labeling, which mainly includes:transforming entity-relation extraction problem into span labeling problem; the tokens are combined and arranged and re-tiled into a Span sequence. LSTM and multi-head self-attention mechanism are used to extract deep semantic features of the span. An entity relation label is designed, and a multi-layer labeling method is used for relation label classification. Experiments are carried out on the English datasets NYT and WebNLG. Compared with the baseline model, the F1 value is significantly improved, which verifies the effectiveness of the model, indicating that the model can effectively solve the above problems.

Key words: relation extraction, joint extraction, span-labeling, mapping strategy, exposure bias, entity nesting, entity overlap

摘要: 关系抽取作为知识图谱等诸多领域的上游任务,具有广泛应用价值,近年来受到广泛关注。关系抽取模型普遍存在暴露偏差问题,抽取文本普遍存在实体嵌套和实体重叠问题,这些问题严重影响了模型性能。因此,提出了一种基于片段标注的实体关系联合抽取模型(span-labeling based model,SLM),主要包括:将实体关系抽取问题转化为片段标注问题;使用滑动窗口和三种映射策略将词元(token)序列进行组合排列重新平铺成片段(span)序列;使用LSTM和多头自注意力机制进行片段深层语义特征提取;设计了实体关系标签,使用多层标注方法进行关系标签分类。在英文数据集NYT、WebNLG上进行实验,相对于基线模型F1值显著提高,验证了模型的有效性,能有效解决上述问题。

关键词: 关系抽取, 联合抽取, 片段标注, 映射策略, 暴露偏差, 实体嵌套, 实体重叠