计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (16): 31-49.DOI: 10.3778/j.issn.1002-8331.2212-0251

• 热点与综述 • 上一篇    下一篇

开放信息抽取研究综述

胡杭乐,程春雷,叶青,彭琳,沈友志   

  1. 1.江西中医药大学 计算机学院,南昌 330004
    2.江西中医药大学 中医人工智能重点研究室,南昌 330004
  • 出版日期:2023-08-15 发布日期:2023-08-15

Survey of Open Information Extraction Research

HU Hangle, CHENG Chunlei, YE Qing, PENG Lin, SHEN Youzhi   

  1. 1.School of Computer Science, Jiangxi University of Chinese Medicine, Nanchang 330004, China
    2.Key Laboratory of Artificial Intelligence in Chinese Medicine, Jiangxi University of Chinese Medicine, Nanchang 330004, China
  • Online:2023-08-15 Published:2023-08-15

摘要: 开放信息抽取(open information extraction,OpenIE)旨在从自然语言文本中以关系短语及参数的形式生成信息的结构化表示,为知识库自动化构建、开放域问答和显式推理等下游任务提供基础支持。近年来,该领域的研究与应用不断深入,涌现了众多卓有成效的OpenIE研究思路和拓展模型。从OpenIE的定义、数据集和基准度量出发,详细深入地综述和比较了传统的OpenIE模型和基于神经网络的模型。针对传统方法,分类介绍了基于学习的模型和基于规则的模型,并深入研究了不同模型的评估方法,分析了不同类别模型之间的差异。针对基于神经网络的模型,根据抽取谓词的不同方式,将其分为联合抽取和分步抽取两种类型,并对每种模型进行了综述和对比分析。对OpenIE常用的数据集以及主要的评估基准进行了概述,并在此基础上进行了对比分析。从训练、改进以及应用三个角度对OpenIE的工作进行了总结,并对该工作的未来进行了展望。

关键词: 自然语言处理, 开放信息抽取(OpenIE), 神经网络

Abstract: Open information extraction(OpenIE) aims to generate a structured representation of information from natural language text in the form of relational phrases and parameters, providing basic support for downstream tasks such as knowledge base automatic construction, open domain question answering, and explicit reasoning. In recent years, with the deepening of research in this field, researchers have expanded OpenIE from multiple directions and proposed many OpenIE models based on neural networks. Starting from the definition, dataset and benchmark  measurement of OpenIE, this paper summarizes and compares the traditional OpenIE model and the model based on neural network in detail. First of all, according to the traditional methods, the learning-based model and rule-based model are introduced, the evaluation methods of different models are deeply studied, and the differences between different types of models are analyzed. Secondly, according to the different ways of extracting predicates, the models based on neural networks are divided into two types:joint extraction and step extraction, and each model is reviewed and compared. Then, the datasets commonly used by OpenIE and the main evaluation benchmarks are summarized, and a comparative analysis is made on this basis. Finally, the work of OpenIE is summarized from three aspects of training, improvement and application, and the future of this work is prospected.

Key words: natural language processing, open information extraction(OpenIE), neural network