计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (16): 31-49.DOI: 10.3778/j.issn.1002-8331.2212-0251
胡杭乐,程春雷,叶青,彭琳,沈友志
出版日期:
2023-08-15
发布日期:
2023-08-15
HU Hangle, CHENG Chunlei, YE Qing, PENG Lin, SHEN Youzhi
Online:
2023-08-15
Published:
2023-08-15
摘要: 开放信息抽取(open information extraction,OpenIE)旨在从自然语言文本中以关系短语及参数的形式生成信息的结构化表示,为知识库自动化构建、开放域问答和显式推理等下游任务提供基础支持。近年来,该领域的研究与应用不断深入,涌现了众多卓有成效的OpenIE研究思路和拓展模型。从OpenIE的定义、数据集和基准度量出发,详细深入地综述和比较了传统的OpenIE模型和基于神经网络的模型。针对传统方法,分类介绍了基于学习的模型和基于规则的模型,并深入研究了不同模型的评估方法,分析了不同类别模型之间的差异。针对基于神经网络的模型,根据抽取谓词的不同方式,将其分为联合抽取和分步抽取两种类型,并对每种模型进行了综述和对比分析。对OpenIE常用的数据集以及主要的评估基准进行了概述,并在此基础上进行了对比分析。从训练、改进以及应用三个角度对OpenIE的工作进行了总结,并对该工作的未来进行了展望。
胡杭乐, 程春雷, 叶青, 彭琳, 沈友志. 开放信息抽取研究综述[J]. 计算机工程与应用, 2023, 59(16): 31-49.
HU Hangle, CHENG Chunlei, YE Qing, PENG Lin, SHEN Youzhi. Survey of Open Information Extraction Research[J]. Computer Engineering and Applications, 2023, 59(16): 31-49.
[1] JURAFSKY D,MARTIN J H.Na?ve Bayes classifier approach to word sense disambiguation[J].Computational Lexical Semantics,2009. [2] YATES A,BANKO M,BROADHEAD M,et al.TextRunner:open information extraction on the web[C]//Proceedings of Human Language Technologies:the Annual Conference of the North American Chapter of the Association for Computational Linguistics,2007:25-26. [3] NIKLAUS C,CETTO M,FREITAS A,et al.A survey on open information extraction[J].arXiv:1806.05599,2018. [4] STANOVSKY G,DAGAN I.Creating a large benchmark for open information extraction[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,2016:2300-2305. [5] BHARDWAJ S,AGGARWAL S,MAUSAM M.CaRB:a crowdsourced benchmark for open IE[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,2019:6262-6267. [6] GASHTEOVSKI K,YU M,KOTNIS B,et al.BenchIE:open information extraction evaluation based on facts,not tokens[J].arXiv:2109.06850,2021. [7] LI J,SUN A,HAN J,et al.A survey on deep learning for named entity recognition[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):50-70. [8] YANG S,WANG Y,CHU X.A survey of deep learning techniques for neural machine translation[J].arXiv:2002. 07526,2020. [9] VASILKOVSKY M,ALEKSEEV A,MALYKH V,et al.DETIE:multilingual open information extraction inspired by object detection[C]//Proceedings of the 36th AAAI Conference on Artificial Intelligence,2022. [10] CABRAL B S,SOUZA M,CLARO D B.Explainable OpenIE classifier with morpho-syntactic rules[C]//Proceedings of the 2020 Workshop on Hybrid Intelligence for Natural Language Processing Tasks Co-located with 24th European Conference on Artificial Intelligence,2020:7-15. [11] KOTNIS B,GASHTEOVSKI K,RUBIO D,et al.MILIE:modular & iterative multilingual open information extraction[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers),2022:6939-6950. [12] MAUSAM M.Open information extraction systems and downstream applications[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence,2016:4074-4077. [13] WU F,WELD D S.Open information extraction using Wikipedia[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics,2010:118-127. [14] SCHMITZ M,SODERLAND S,BART R,et al.Open language learning for information extraction[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,2012:523-534. [15] SAHA S,PAL H.Bootstrapping for numerical open IE[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers),2017:317-323. [16] CHITICARIU L,LI Y,REISS F.Rule-based information extraction is dead! Long live rule-based information extraction systems![C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,2013:827-832. [17] FADER A,SODERLAND S,ETZIONI O.Identifying relations for open information extraction[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing,2011:1535-1545. [18] AKBIK A,L?SER A.KRAKEN:[N]-ary facts in open information extraction[C]//Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction,2012:52-56. [19] MESQUITA F,SCHMIDEK J,BARBOSA D.Effectiveness and efficiency of open relation extraction[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,2013:447-457. [20] STANOVSKY G,FICLER J,DAGAN I,et al.Getting more out of syntax with PROPS[J].arXiv:1603.01648,2016. [21] FALKE T,STANOVSKY G,GUREVYCH I,et al.Porting an open information extraction system from English to German[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,2016:892-898. [22] KUEBLER J,TONG L,JIANG M.Multi-round parsing-based multiword rules for scientific OpenIE[J].arXiv:2108. 02074,2021. [23] DEL CORRO L,GEMULLA R.CLAUSIE:clause-based open information extraction[C]//Proceedings of the 22nd International Conference on World Wide Web,2013:355-366. [24] SCHMIDEK J,BARBOSA D.Improving open relation extraction via sentence re-structuring[C]//Proceedings of the 9th International Conference on Language Resources and Evaluation,2014:3720-3723. [25] ANGELI G,PREMKUMAR M J J,MANNING C D.Leveraging linguistic structure for open domain information extraction[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing(Volume 1:Long Papers),2015:344-354. [26] CHRISTENSEN J,SODERLAND S,ETZIONI O.An analysis of open information extraction based on semantic role labeling[C]//Proceedings of the 6th International Conference on Knowledge Capture,2011:113-120. [27] PAL H.Demonyms and compound relational nouns in nominal open IE[C]//Proceedings of the 5th Workshop on Automated Knowledge Base Construction,2016:35-39. [28] SAHA S.Open information extraction from conjunctive sentences[C]//Proceedings of the 27th International Conference on Computational Linguistics,2018:2288-2299. [29] BAST H,HAUSSMANN E.Open information extraction via contextual sentence decomposition[C]//2013 IEEE 7th International Conference on Semantic Computing,2013:154-159. [30] BHUTANI N,JAGADISH H V,RADEV D.Nested propositions in open information extraction[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,2016:55-64. [31] GASHTEOVSKI K,GEMULLA R,CORRO L.MINIE:minimizing facts in open information extraction[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,2017:2630-2640. [32] CETTO M,NIKLAUS C,FREITAS A,et al.Graphene:semantically-linked propositions in open information extraction[J].arXiv:1807.11276,2018. [33] MANN W C,THOMPSON S A.Rhetorical structure theory:toward a functional theory of text organization[J].Text-Interdisciplinary Journal for the Study of Discourse,1988,8(3):243-281. [34] DE MARNEFFE M C,MANNING C D.The Stanford typed dependencies representation[C]//Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation,2008:1-8. [35] MERHAV Y,MESQUITA F,BARBOSA D,et al.Extracting information networks from the blogosphere[J].ACM Transactions on the Web,2012,6(3):1-33. [36] BALLESTEROS M,BOHNET B,MILLE S,et al.Deep-syntactic parsing[C]//Proceedings of the 25th International Conference on Computational Linguistics:Technical Papers,2014:1402-1413. [37] MADAAN A,MITTAL A,RAMAKRISHNAN G,et al.Numerical relation extraction with minimal supervision[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence,2016. [38] NAKASHOLE N,WEIKUM G,SUCHANEK F.PATTY:a taxonomy of relational patterns with semantic types[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,2012:1135-1145. [39] XU Y,KIM M Y,QUINN K M,et al.Open information extraction with tree kernels[C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2013:868-877. [40] JOHANSSON R,NUGUES P.Dependency-based semantic role labeling of PropBank[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing,2008:69-78. [41] KOLLURU K,ADLAKHA V,AGGARWAL S,et al.OpenIE6:iterative grid labeling and coordination analysis for open information extraction[J].arXiv:2010.03147,2020. [42] KOLLURU K,AGGARWAL S,RATHORE V,et al.IMOJIE:iterative memory-based joint open information extraction[J].arXiv:2005.08178,2020. [43] NAYAK N,KOWARSKY M,ANGELI G,et al.A dictionary of nonsubsective adjectives:CSTR 2014-04[R].Stanford University.Department of Computer Science,2014. [44] JI H,GRISHMAN R,DANG H T,et al.Overview of the TAC 2010 knowledge base population track[C]//Proceedings of the 3rd Text Analysis Conference,2010. [45] SURDEANU M.Overview of the TAC2013 knowledge base population evaluation:English slot filling and temporal slot filling[J].Theory and Applications of Categories,2013,8:2. [46] SODERLAND S,GILMER J,BART R,et al.Open information extraction to KBP relations in 3 hours[C]//Proceedings of the 6th Text Analysis Conference,2013. [47] SCHNEIDER R,OBERHAUSER T,KLATT T,et al.Analysing errors of open information extraction systems[J].arXiv:1707.07499,2017. [48] CUI L,WEI F,ZHOU M.Neural open information extraction[J].arXiv:1805.04270,2018. [49] SUN M,LI X,WANG X,et al.Logician:a unified end-to-end neural approach for open-domain information extraction[C]//Proceedings of the 11th ACM International Conference on Web Search and Data Mining,2018:556-564. [50] LIU G,LI X,WANG J,et al.Extracting knowledge from web text with Monte Carlo tree search[C]//Proceedings of the Web Conference 2020,2020:2585-2591. [51] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [52] STANOVSKY G,MICHAEL J,ZETTLEMOYER L,et al.Supervised open information extraction[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,(Volume 1:Long Papers),2018:885-895. [53] ROY A,PARK Y,LEE T,et al.Supervising unsupervised open information extraction models[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,2019:728-737. [54] SCHUSTER M,PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing,1997,45(11):2673-2681. [55] SARHAN I,SPRUIT M R.Contextualized word embeddings in a neural open information extraction model[C]//Proceedings of the 2019 International Conference on Applications of Natural Language to Information Systems.Cham:Springer,2019:359-367. [56] HU H,XING Q,CHEN M.Enhanced distant supervised open information extraction[C]//Proceedings of the 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics,2021:619-624. [57] SUI D,CHEN Y,LIU K,et al.Joint entity and relation extraction with set prediction networks[J].arXiv:2011. 01675,2020. [58] ZHANG R H,LIU Q,FAN A X,et al.Minimize exposure bias of Seq2Seq models in joint entity and relation extraction[J].arXiv:2009.07503,2020. [59] YU B,WANG Y,LIU T,et al.Maximal clique based non-autoregressive open information extraction[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,2021:9696-9706. [60] ZENG D,LIU K,LAI S,et al.Relation classification via convolutional deep neural network[C]//Proceedings of the 25th International Conference on Computational Linguistics:Technical Papers,2014:2335-2344. [61] HAN J,WANG H.Generative adversarial networks for open information extraction[J].Advances in Computational Intelligence,2021,1(4):1-11. [62] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial networks[J].Communications of the ACM,2020,63(11):139-144. [63] ZHAN J,ZHAO H.Span model for open information extraction on accurate corpus[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence,2020:9523-9530. [64] RO Y,LEE Y,KANG P.Multi2OIE:multilingual open information extraction based on multi-head attention with BERT[J].arXiv:2009.08128,2020. [65] TSAI Y H H,BAI S,LIANG P P,et al.Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics,2019:6558-6569. [66] KOLLURU K,MOHAMMED M,MITTAL S,et al.Alignment-augmented consistent translation for multilingual open information extraction[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers),2022:2502-2517. [67] LYU Z,SHI K,LI X,et al.Multi-grained dependency graph neural network for Chinese open information extraction[C]//Proceedings of the 25th Pacific-Asia Conference on Knowledge Discovery and Data Mining.Cham:Springer,2021:155-167. [68] VELI?KOVI? P,CUCURULL G,CASANOVA A,et al.Graph attention networks[J].arXiv:1710.10903,2017. [69] DOZAT T,MANNING C D.Deep biaffine attention for neural dependency parsing[J].arXiv:1611.01734,2016. [70] ATMANI M,LAFOURCADE M.Universal dependencies for multilingual open information extraction[C]//Proceedings of the 3rd Conference on Language,Data and Knowledge,2021. [71] QI P,ZHANG Y,ZHANG Y,et al.Stanza:a Python natural language processing toolkit for many human languages[J].arXiv:2003.07082,2020. [72] NIVRE J,DE MARNEFFE M C,GINTER F,et al.Universal dependencies v1:a multilingual treebank collection[C]//Proceedings of the 10th International Conference on Language Resources and Evaluation,2016:1659-1666. [73] LI Y,YANG Y,HU Q,et al.An argument extraction decoder in open information extraction[C]//Proceedings of the 43rd European Conference on Information Retrieval.Cham:Springer,2021:313-326. [74] WANG J,ZHENG X,YANG Q,et al.Towards nested and fine-grained open information extraction[C]//Proceedings of the 6th China Conference on Knowledge Graph and Semantic Computing.Singapore:Springer,2021:185-197. [75] BAYAT F F,BHUTANI N,JAGADISH H V.CompactIE:compact facts in open information extraction[J].arXiv:2205.02880,2022. [76] WANG Y,SUN C,WU Y,et al.UniRE:a unified label space for entity relation extraction[J].arXiv:2107.04292,2021. [77] PONTI E M,VULI? I,COTTERELL R,et al.Towards zero-shot language modeling[J].arXiv:2108.03334,2021. [78] SOLAWETZ J,LARSON S.LSOIE:a large-scale dataset for supervised open information extraction[J].arXiv:2101. 11177,2021. [79] HE L,LEWIS M,ZETTLEMOYER L.Question-answer driven semantic role labeling:using natural language to annotate natural language[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing,2015:643-653. [80] LéCHELLE W,GOTTI F,LANGLAIS P.Wire57:a fine-grained benchmark for open information extraction[J].arXiv:1809.08962,2018. [81] WHITE A S,REISINGER D,SAKAGUCHI K,et al.Universal decompositional semantics on universal dependencies[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,2016:1713-1723. [82] HAN J,WANG H.Improving open information extraction with distant supervision learning[J].Neural Processing Letters,2021,53(5):3287-3306. [83] TANG J,LU Y,LIN H,et al.Syntactic and semantic-driven learning for open information extraction[J].arXiv:2103.03448,2021. [84] VAN LE D,MONTGOMERY J,KIRKBY K,et al.Adding an inception network to neural network open information extraction[J].IEEE Intelligent Systems,2022,37(3):85-97. [85] ROTH M,LAPATA M.Neural semantic role labeling with dependency path embeddings[J].arXiv:1605.07515,2016. [86] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition,2015:1-9. [87] GASHTEOVSKI K,WANNER S,HERTLING S,et al.OPIEC:an open information extraction corpus[J].arXiv:1904.12324,2019. [88] BROSCHEIT S,GASHTEOVSKI K,ACHENBACH M.OpenIE for slot filling at TAC KBP 2017-system description[C]//Proceedings of the 2017 Text Analysis Conference,2017. [89] GASHTEOVSKI K,GEMULLA R,KOTNIS B,et al.On aligning OpenIE extractions with knowledge bases:a case study[C]//Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems,2020:143-154. [90] GAMALLO P,GARCIA M.Multilingual open information extraction[C]//Proceedings of the 17th Portuguese Conference on Artificial Intelligence.Cham:Springer,2015:711-722. [91] BENDER E.English isn’t generic for language,despite what NLP papers might lead you to believe[C]//Symposium on Data Science & Statistics,2019. [92] BENDER E M.Linguistically na?ve!= language independent:Why NLP needs linguistic typology[C]//Proceedings of the EACL 2009 Workshop on the Interaction Between Linguistics and Computational Linguistics:Virtuous,Vicious or Vacuous,2009:26-32. [93] YU B,ZHANG Z,SHENG J,et al.Semi-open information extraction[C]//Proceedings of the Web Conference 2021,2021:1661-1672. [94] YAN Z,TANG D,DUAN N,et al.Assertion-based QA with question-aware open information extraction[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence,2018. [95] BHUTANI N,SUHARA Y,TAN W C,et al.Open information extraction from question-answer pairs[J].arXiv:1903.00172,2019. [96] GROTH P,LAURUHN M,SCERRI A,et al.Open information extraction on scientific text:an evaluation[J].arXiv:1802.05574,2018. |
[1] | 苟园旻, 闫建伟, 张富贵, 孙成宇, 徐勇. 水果采摘机器人视觉系统与机械手研究进展[J]. 计算机工程与应用, 2023, 59(9): 13-26. |
[2] | 陈吉尚, 哈里旦木·阿布都克里木, 梁蕴泽, 阿布都克力木·阿布力孜, 米克拉依·艾山, 郭文强. 深度学习在符号音乐生成中的应用研究综述[J]. 计算机工程与应用, 2023, 59(9): 27-45. |
[3] | 孙爱晶, 王国庆. 邻居关系感知的图卷积网络推荐模型[J]. 计算机工程与应用, 2023, 59(9): 112-122. |
[4] | 李文举, 储王慧, 崔柳, 苏攀, 张干. 结合图采样和图注意力的3D目标检测方法[J]. 计算机工程与应用, 2023, 59(9): 237-244. |
[5] | 王昌海, 梁辉, 王博, 崔晓旭. 基于指数成分股关联的图卷积指数走势预测[J]. 计算机工程与应用, 2023, 59(9): 319-328. |
[6] | 张婷, 张兴忠, 王慧民, 杨罡, 王大伟. 基于图神经网络的变电站场景三维目标检测[J]. 计算机工程与应用, 2023, 59(9): 329-336. |
[7] | 杨崇洛, 生龙, 魏忠诚, 王巍. 新冠文本实体关系抽取及数据集构建方法研究[J]. 计算机工程与应用, 2023, 59(8): 97-104. |
[8] | 陆林, 季繁繁, 袁晓彤. 随机初始化神经网络剪枝的稀疏二值规划方法[J]. 计算机工程与应用, 2023, 59(8): 138-147. |
[9] | 兰红, 陈浩, 张蒲芬. 集图卷积和三维方向卷积的点云分类分割模型[J]. 计算机工程与应用, 2023, 59(8): 182-191. |
[10] | 崔少国, 独潇, 杨泽田. 多注意力机制融合低高阶特征的神经推荐算法[J]. 计算机工程与应用, 2023, 59(8): 192-199. |
[11] | 蒋玉英, 陈心雨, 李广明, 王飞, 葛宏义. 图神经网络及其在图像处理领域的研究进展[J]. 计算机工程与应用, 2023, 59(7): 15-30. |
[12] | 龙其刚, 王金铭, 梁燕, 宋杰, 冯亚东, 李鹏, 赵凌霄. 基于卷积神经网络的内镜图像中食管病变分类[J]. 计算机工程与应用, 2023, 59(7): 118-125. |
[13] | 冯雅茹, 黄贤英, 李伟. 增强深层话题语义的对话引导模型[J]. 计算机工程与应用, 2023, 59(7): 171-179. |
[14] | 李卓容, 唐云祁. 基于深度学习的多模态生物特征融合模型[J]. 计算机工程与应用, 2023, 59(7): 180-189. |
[15] | 邓德军, 徐洪珍, 韦诗玥. E-V-ALSTM模型的股价预测[J]. 计算机工程与应用, 2023, 59(6): 101-112. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||