计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (13): 164-170.DOI: 10.3778/j.issn.1002-8331.2204-0327

• 模式识别与人工智能 • 上一篇    下一篇

快速联合实体和关系抽取模型

杨冬,田生伟,禹龙,周铁军,王博   

  1. 1.新疆大学 软件学院,乌鲁木齐 830000
    2.新疆大学 软件技术重点实验室,乌鲁木齐 830000
    3.新疆互联网信息中心,乌鲁木齐 830000
  • 出版日期:2023-07-01 发布日期:2023-07-01

Fast Model for Joint Extraction of Entity and Relation

YANG Dong, TIAN Shengwei, YU Long, ZHOU Tiejun, WANG Bo   

  1. 1.College of Software, Xinjiang University, Urumqi 830000, China
    2.Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi 830000, China
    3.Xinjiang Internet Information Center, Urumqi 830000, China
  • Online:2023-07-01 Published:2023-07-01

摘要: 从纯文本中抽取实体和关系是知识和问答任务的关键技术。传统的多头模型预测所有片段对的关系类型,而由于关系的稀疏性,片段对的负标签数量远大于正标签。同时,该计算方式导致计算量与句长度的二次方成正比,降低了模型的实用性。为解决该问题,快速实体关系抽取模型被提出。对于命名实体识别任务,实体的开始和结束标签分别对两个指针网络预测。在关系抽取任务中删除了不包含实体结束标签的语义片段对。该方法减少了片段对的数量并加快了关系抽取任务的推理速度。为了证明模型的有效性,在英语新闻数据集ACE05和荷兰语房地产数据集DREC上进行了实验。实验结果表明,与基线模型相比,该模型取得了有竞争力的性能,其推理速度在ACE05上提高了约1.4倍,在DREC上提高了约2.1倍。

关键词: 实体识别, 关系抽取, 神经网络, 自然语言处理, 信息抽取

Abstract: Extracting entities and relations from plain text is a key technique for knowledge and question answering tasks. The traditional multi-head model predicts the relation type of all segment pairs, while the number of negative labels for segment pairs is much larger than positive labels due to the sparsity of relations. At the same time, this calculation method causes the calculation amount to be proportional to the square of sentence length, which reduces the practicability of the model. To solve this problem, a fast extraction model of entity and relation is proposed. For the named entity recognition task, the start and end labels of entities are predicted by two pointer networks, respectively. Semantic segment pairs that do not contain entity end tags are removed in the relation extraction task. This method reduces the number of segment pairs and speeds up inference for relation extraction tasks. To demonstrate the effectiveness of the model, experiments are conducted on the English news dataset ACE05 and the Dutch real estate dataset DREC. The experimental results show that the model achieves competitive performance compared with the baseline model, and its inference speed is improved by 1.4 times on ACE05 and 2.1 times on DREC.

Key words: entity recognition, relation extraction, neural network, natural language processing, information extraction