基于文献挖掘的生物实体关系提取研究

doi:10.3778/j.issn.1002-8331.1912-0489

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (7): 115-120.DOI: 10.3778/j.issn.1002-8331.1912-0489

基于文献挖掘的生物实体关系提取研究

陈伟，徐云

1.中国科学技术大学计算机科学与技术学院，合肥 230026
2.安徽省高性能计算重点实验室，合肥 230026

出版日期:2021-04-01 发布日期:2021-04-02

Research on Extraction of Biomedical Entity Relation Based on Literature Mining

CHEN Wei, XU Yun

1.School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China
2.Key Laboratory of High Performance Computing of Anhui Province, Hefei 230026, China

Online:2021-04-01 Published:2021-04-02

摘要/Abstract

摘要：

生物医学研究人员经常搜索大量文献，寻找生物实体之间的作用关系，如：药物-药物、化合物-蛋白质等作用关系。随着医学文献的激增和深度学习的发展，自动从文献中提取生物实体作用关系已经显示出巨大潜力。以往使用深度学习的方法取得了一定效果，但存在以下问题：模型采用静态词向量，不能区分一词多义；未考虑单词的权重，对长句子提取效果较差；通过多种模型集成来改善样本不平衡问题，模型较为复杂。为此提出一种基于残差结构的深层多通道CNN模型（MCCNN），通过BERT（Bidirectional Encoder Representation from Transformers）产生动态词向量来提高词汇语义表示的准确性，利用多头注意力捕获长句子的依赖并通过设计Ranking损失函数代替多模型集成来降低样本不平衡的影响。在多个数据集上进行测试，结果表明提出的方法具有较好的效果。

关键词: 生物医学文献, 关系提取, 注意力机制, 多通道

Abstract:

Biomedical researchers often search the literature for interactions between biological entities, such as drug-drug interactions, chemical-protein interactions. With the rapid growth of biomedical literature and the development of deep learning, automatic extraction of biological entity interactions from literature has shown great potential. The previous methods using deep learning have achieved certain results, but there are some problems as follows：The static word vector is used in the model, which can’t distinguish the polysemy of a word; the weight of the word is not considered, and the effect of long sentence extraction is poor; it improves the sample imbalance problem by ensembles of models, which is more complex. Therefore, the paper proposes a deep Multi-Channel CNN（MCCNN） model based on residual structure, which uses BERT（Bidirectional Encoder Representation from Transformers） to generate dynamic word vectors to improve the accuracy of word semantic representation, and uses multi-head attention to capture long sentence dependencies, and reduces the impact of sample imbalance through the Ranking loss function instead of ensembles of models. Experiments on several data sets show that the proposed method is effective.

Key words: biomedical literature, relation extraction, attention mechanisms, multi-channel

陈伟，徐云. 基于文献挖掘的生物实体关系提取研究[J]. 计算机工程与应用, 2021, 57(7): 115-120.

CHEN Wei, XU Yun. Research on Extraction of Biomedical Entity Relation Based on Literature Mining[J]. Computer Engineering and Applications, 2021, 57(7): 115-120.

[1]	许昊，张凯，田英杰，种法广，王子超. 深度神经网络图像描述综述[J]. 计算机工程与应用, 2021, 57(9): 9-22.
[2]	张朕通，单玉刚，袁杰. 联合多尺度和注意力机制的遥感影像检测[J]. 计算机工程与应用, 2021, 57(9): 212-216.
[3]	赵圆丽，梁志剑. 基于异核卷积双注意机制的立场检测研究[J]. 计算机工程与应用, 2021, 57(8): 119-125.
[4]	张越，黄友锐，刘鹏坤. 引入注意力机制的多分辨率人体姿态估计研究[J]. 计算机工程与应用, 2021, 57(8): 126-132.
[5]	王玲，王家沛，王鹏，孙爽滋. 融合注意力机制的孪生网络目标跟踪算法研究[J]. 计算机工程与应用, 2021, 57(8): 169-174.
[6]	杨波，陶青川，董沛君. 改进Deeplab v3+网络的手术器械分割方法[J]. 计算机工程与应用, 2021, 57(7): 222-227.
[7]	黄金杰，蔺江全，何勇军，何瑾洁，王雅君. 局部语义与上下文关系的中文短文本分类算法[J]. 计算机工程与应用, 2021, 57(6): 94-100.
[8]	刘博闻，范春晓. 基于位置感知能力胶囊网络的实体关系提取[J]. 计算机工程与应用, 2021, 57(6): 101-107.
[9]	张睿，吴伯雄，张丽园，张博. 复杂场景下行人轨迹预测方法[J]. 计算机工程与应用, 2021, 57(6): 138-143.
[10]	魏玮，杨茹，朱叶. 改进CenterNet的遥感图像目标检测[J]. 计算机工程与应用, 2021, 57(6): 191-199.
[11]	徐建国，刘泳慧，刘梦凡. 基于BILSTM-CRF的高校政策语义角色标注研究[J]. 计算机工程与应用, 2021, 57(6): 207-211.
[12]	张倩玉，严冬梅，韩佳彤. 结合深度学习和分解算法的股票价格预测研究[J]. 计算机工程与应用, 2021, 57(5): 56-64.
[13]	王天罡，张晓滨，马红叶，蔡宏伟. 可解释的层次注意力机制网络危重症预警[J]. 计算机工程与应用, 2021, 57(5): 131-138.
[14]	邓小桐，曹铁勇，方正，郑云飞. 改进RetinaNet的伪装人员检测方法研究[J]. 计算机工程与应用, 2021, 57(5): 190-196.
[15]	赵辉，李志伟，方禄发. 特征信息增强的单发多框检测器算法[J]. 计算机工程与应用, 2021, 57(4): 148-154.

基于文献挖掘的生物实体关系提取研究

Research on Extraction of Biomedical Entity Relation Based on Literature Mining

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics