计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (10): 121-131.DOI: 10.3778/j.issn.1002-8331.2301-0152

• 模式识别与人工智能 • 上一篇    下一篇

两阶段问答范式的生物医学事件触发词检测

行帅,熊玉洁,苏前敏,黄继汉   

  1. 1.上海工程技术大学 电子电气工程学院,上海 201620
    2.上海中医药大学 药物临床研究中心,上海 201203
  • 出版日期:2024-05-15 发布日期:2024-05-15

Biomedical Event Trigger Detection Based on Two-Stage Question Answering Paradigm

XING Shuai, XIONG Yujie, SU Qianmin, HUANG Jihan   

  1. 1.School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
    2.Center for Drug Clinical Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
  • Online:2024-05-15 Published:2024-05-15

摘要: 现有的生物医学事件触发词检测存在以下缺陷:保留了与触发词无关的冗余信息;忽略了实体与事件之间的潜在关联性;传统方法容易受到数据稀缺性的影响。针对上述问题,提出了一种两阶段问答范式的生物医学事件触发词检测方法。在事件类型识别阶段,采用基于句法距离的注意力捕获更有意义的上下文特征,排除无关信息的干扰;为了有效利用实体中的潜在特征,采用全局统计的单词-实体-事件共现特征,指导事件类型感知注意力挖掘词与事件之间的强关联性。在触发词定位阶段,根据识别出的事件类型,制定问题回答该事件对应的触发词索引,从而利用丰富的问答数据库实现数据增强。在MLEE语料库上的结果表明,两阶段问答范式、句法距离和事件类型感知注意力都有效地提升了模型性能,所提出的模型取得了81.39%的F1分数,并在多个事件类型上的详细结果均优于其他基线模型。

关键词: 生物医学事件, 触发词检测, 句法距离, 单词-实体-事件共现特征, 两阶段问答范式

Abstract: The existing biomedical event trigger detection methods have the following defects: Redundant information unrelated to triggers are retained; potential correlations between entities and events are ignored; traditional methods are vulnerable to data scarcity. A biomedical event trigger detection based on two-stage question answering paradigm is proposed to address the above problems. In the event type identification phase, in order to exclude the interference of irrelevant information, the attention based on syntactic distance is allowed to capture more meaningful contextual features. In order to effectively utilize the potential features in the entities, the word-entity-event co-occurrence feature based on global statistics is used to guide event type aware attention to explore the strong relationship between words and events. In the trigger localization phase, the trigger index of the event in the sentence is answered according to the identified event type questions, thus leveraging the rich question answering database to achieve data enhancement. The results on the MLEE corpus show that the two-stage question answering paradigm, syntactic distance attention, and event type aware attention effectively improve the performance of the model, and the proposed model achieves 81.39% F1-score, outperforming other baseline models in terms of detailed results for multiple event types.

Key words: biomedical events, trigger detection, syntactic distance, word-entity-event co-occurrence feature, two-stage question answering paradigm