计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (3): 77-83.DOI: 10.3778/j.issn.1002-8331.2205-0325

• 理论与研发 • 上一篇    下一篇

融合自注意力的关系抽取级联标记框架研究

肖立中,臧中兴,宋赛赛   

  1. 上海应用技术大学 计算机科学与信息工程学院,上海 201418
  • 出版日期:2023-02-01 发布日期:2023-02-01

Research on Cascaded Labeling Framework for Relation Extraction with Self-Attention

XIAO Lizhong, ZANG Zhongxing, SONG Saisai   

  1. School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, China
  • Online:2023-02-01 Published:2023-02-01

摘要: 命名实体识别和关系抽取是自然语言处理和知识图谱构建中两个十分重要的子任务。针对关系抽取过程中容易出现的错误传递和实体共享的缺陷,提出了一种融合自注意力机制的实体关系抽取级联标记框架Att-CasRel,不仅解决了级联错误,还能够解决同一个句子中多个关系三元组共享相同实体的问题。在Bert模型的基础上,使用CMeIE数据集的文本进行再训练得到适用于中文医疗领域的CB-Bert,并在尾实体识别阶段融入自注意力机制来增强头实体编码向量的特征表达,提高了模型的特征提取能力。在CMeIE数据集上的实验结果表明,该标记框架相较于独立抽取的模型以及其他联合抽取模型取得了更好的效果。

关键词: 自然语言处理, 关系抽取, 自注意力机制, 知识图谱, Bert

Abstract: Named entity recognition and relation extraction are two essential subtasks in natural language processing and knowledge graph construction. In order to solve the problems of error transmission and entity sharing in the process of relation extraction, an entity relation extraction cascading labeling framework, Att-CasRel, which incorporates self-attention mechanism, is proposed. It can not only solve the cascading errors, but also solve the problem of multiple relational triples sharing the same entity in the same sentence. Based on Bert model, CB-Bert suitable for Chinese medical field is obtained by retraining text of CMeIE data set. Self-attention mechanism is incorporated in tail entity recognition stage to enhance feature expression of encoding vector of head entity and improve feature extraction capability of the model. Experimental results on CMeIE data set show that the proposed framework achieves better performance than independent extraction model extraction and other joint extraction models.

Key words: natural language processing, relation extraction, self-attention mechanism, knowledge graph, Bert