Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (16): 93-100.DOI: 10.3778/j.issn.1002-8331.2204-0508

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Chinese Event Extraction by Machine Reading Comprehension

WU Xu, BIAN Wenqiang, XIE Xiaqing, SUN Lijuan   

  1. 1.School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2.Key Laboratory of Trustworthy Distributed Computing and Service, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China
    3.Library, Beijing University of Posts and Telecommunications, Beijing 100876, China
    4.School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Online:2023-08-15 Published:2023-08-15

机器阅读理解式中文事件抽取方法

吴旭,卞文强,颉夏青,孙利娟   

  1. 1.北京邮电大学 网络空间安全学院,北京 100876
    2.北京邮电大学 可信分布式计算与服务教育部重点实验室,北京 100876
    3.北京邮电大学 图书馆,北京 100876
    4.北京邮电大学 经济管理学院,北京 100876

Abstract: Event extraction is an important part of information extraction. It has important applications in knowledge graph construction, financial industry analysis and content security. Existing Chinese event extraction methods are often based on the pipeline tasks such as NER(named entity recognition), RE(relation extraction), text classification. Transforming event extraction into MRCtask can let model learn the prior information contained in the question. This paper proposes a pre-training model based method, named Chinese event extraction by machine reading comprehension(CEEMRC), which simplifies event extraction into a cascade of only two question answering models. Firstly, this paper generates the question answering tasks for event trigger extraction, event type classification and attribute extraction. Then, this paper trains two models, one for trigger extraction and event type classification, and the other for attribute extraction, and uses trigger prior feature, word segmentation information, and relative position of trigger word to improve the model effect. Finally, the required extraction is completed with the start and end positions predicted by the models. Chinese event data set named DuEE is used for experiments. The [F1] values of trigger and attribute extractions results are better than those of similar methods, which proves the effectiveness of this method.

Key words: machine reading comprehension, question answering tasks, pre-training model, Chinese event extraction

摘要: 事件抽取是信息抽取的重要任务之一,在知识图谱构建、金融行业分析、内容安全分析等领域均有重要应用。现有中文事件抽取方法一般为实体识别、关系抽取、实体分类等任务的级联。将事件抽取转化为阅读理解任务,可为模型引入问题所含的先验信息。提出一种基于预训练模型的机器阅读理解式中文事件抽取方法(Chinese event extraction by machine reading comprehension,CEEMRC),将中文事件抽取简化为两个问答模型的级联。首先对事件触发词抽取、事件类型判定、属性抽取构建相应的问答任务问题。以RoBERTa为基础构建触发词抽取和事件类型识别联合模型、事件属性抽取两个问答模型,并融入触发词先验特征、分词信息、触发词相对位置等信息来提升模型效果。最后以模型预测回答的起始和结束位置完成所需的抽取。实验使用DuEE中文事件数据集,触发词抽取和属性抽取的[F1]值均优于同类方法,验证了该方法的有效性。

关键词: 机器阅读理解, 问答任务, 预训练模型, 中文事件抽取