Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (3): 150-157.DOI: 10.3778/j.issn.1002-8331.2108-0353

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Three-Stage Document-Level Event Extraction for COVID-19 News

GUO Xin, GAO Caixiang, CHEN Qian, WANG Suge, WANG Xuejing   

  1. 1.School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2.Key Laboratory of Ministry of Education for Computation Intelligence and Chinese Information Processing of Shanxi University, Taiyuan 030006, China
  • Online:2023-02-01 Published:2023-02-01

面向新冠新闻的三阶段篇章级事件抽取方法

郭鑫,高彩翔,陈千,王素格,王雪婧   

  1. 1.山西大学 计算机与信息技术学院,太原 030006
    2.山西大学 计算智能与中文信息处理教育部重点实验室,太原 030006

Abstract: Event extraction is a hot research in the field of information extraction. In the face of COVID-19, event extraction technology can filter out valuable information. However, there is a lack of well-labeled news data set for COVID-19 in event extraction. Moreover, due to the complexity of some events, arguments do not only exist in one sentence, and multiple sentences are needed to fully describe an event. Therefore, an COVID-19 news events data set is constructed, and a three-stage pipeline method is proposed. It classifies event types, then it extracts event sentences. Finally, the document-level event arguments extraction is realized. The experimental results show that the model can reduce event classification time. When extracting two event sentences, the recognition effect for notification type event argument is the best. The accuracy rate, recall rate and F1 value reaches 75.0%, 73.0% and 74.0%, which proves that proposed method can effectively extract document-level COVID-19 events.

Key words: COVID-19, information extraction, event sentence extraction, document-level event extraction

摘要: 事件抽取是信息抽取领域的一个研究热点。在新冠肺炎疫情常态化下,利用事件抽取技术可以筛选出有价值的信息。然而事件抽取领域缺乏精标注的新冠新闻训练数据集,且因部分事件的复杂性,论元不只存在于一句话中,需要多个句子才能完整描述一个事件。因此,首先构建新冠肺炎新闻数据集,接着提出一种三阶段的管道方法实现从篇章中抽取新冠肺炎事件。该方法对数据集进行事件类型分类;进行事件句的抽取;实现篇章级论元抽取。实验结果表明提出的方法能够减少事件分类时间,抽取两个事件句的条件下,对数据通报类论元识别效果最好,准确率、召回率和F1值达到75.0%、73.0%,和74.0%,证明方法能有效抽取新冠肺炎相关篇章级事件。

关键词: 新冠肺炎, 信息抽取, 事件句抽取, 篇章级事件抽取