Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (18): 284-296.DOI: 10.3778/j.issn.1002-8331.2112-0027

• Engineering and Applications • Previous Articles     Next Articles

Research on Joint Extraction Method of Entity and Relation in Tourism Domain

CHEN Yun, Gulila Adonbek, MA Yajing   

  1. 1.College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
    2.Xinjiang Multilingual Information Technology Laboratory, Urumqi 830017, China
  • Online:2022-09-15 Published:2022-09-15

旅游领域实体和关系联合抽取方法研究

陈赟,古丽拉·阿东别克,马雅静   

  1. 1.新疆大学 信息科学与工程学院,乌鲁木齐 830017
    2.新疆多语种信息技术实验室,乌鲁木齐 830017

Abstract: Extracting relational triples from textual information is a key task in building knowledge graphs, which has received wide attention from industry and academia in recent years. To address the problems of entity nesting and relation overlapping in the process of information extraction in tourism domain, a joint extraction model BAMRel based on the biaffine attention mechanism is proposed, which uses the biaffine attention mechanism to construct a classification matrix in the entity identification part and the relation extraction part by sharing the encoding layer, and fuses the entity type information in the relation extraction part to improve the relation extraction effect and it also increases the interaction between the two tasks. In addition, the relation extraction dataset TFRED in the tourism domain is constructed by distant supervision and manual verification, and the BAMRel model achieves an F1 value of 91.8% on this dataset, which effectively solves the entity nesting and relation overlapping problems. To verify the robustness of the model, a comparison experiment is conducted with the mainstream joint extraction model on the Baidu DuIE dataset, and the BAMRel model achieves the highest F1 value of 80.2%.

Key words: joint extraction, entity nesting, relation overlapping, knowledge graph, tourism domain

摘要: 从文本信息中抽取关系三元组是构建知识图谱的关键任务,近年来受到工业界和学术界的广泛关注。针对旅游领域信息抽取过程中出现的实体嵌套和关系重叠问题,提出了一种基于双仿射注意力机制的实体关系联合抽取模型BAMRel,该模型通过共享编码层参数利用双仿射注意力机制在实体识别部分和关系抽取部分构建分类矩阵,并在关系抽取部分融合实体类型信息,提升关系抽取效果的同时增加了两个任务之间的交互。此外,通过远程监督和人工校验构建了旅游领域关系抽取数据集TFRED,BAMRel模型在此数据集上F1值达到了91.8%,有效地解决了实体嵌套和关系重叠问题。为了验证模型的鲁棒性,在百度DuIE数据集上与主流联合抽取模型进行了对比实验,BAMRel模型取得了最高的F1值80.2%。

关键词: 联合抽取, 实体嵌套, 关系重叠, 知识图谱, 旅游领域