计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (6): 188-198.DOI: 10.3778/j.issn.1002-8331.2211-0094

• 模式识别与人工智能 • 上一篇    下一篇

融合多特征及协同注意力的医学命名实体识别

刘歆宁   

  1. 大连东软信息学院 软件工程系,辽宁 大连 116023
  • 出版日期:2024-03-15 发布日期:2024-03-15

Medical Named Entity Recognition Based on Multi-Feature and Co-Attention

LIU Xinning   

  1. Department of Software, Dalian Neusoft University of Information, Dalian, Liaoning 116023, China
  • Online:2024-03-15 Published:2024-03-15

摘要: 针对当前中文医疗命名实体识别中未融合医学领域文本独有的特征信息导致实体识别准确率无法有效提升的情况,及单注意力机制影响实体分类效果的问题,提出一种基于多特征融合和协同注意力机制的中文医疗命名实体识别方法。利用预训练模型得到原始医学文本的向量表示,再利用双向门控循环神经网络(BiGRU)获取字粒度的特征向量。结合医疗领域命名实体鲜明的部首特征,利用迭代膨胀卷积神经网络(IDCNN)提取部首级别的特征向量。使用协同注意力网络(co-attention network)整合特征向量,生成<文字-部首>对的双相关特征,再利用条件随机场(CRF)输出实体识别结果。实验结果表明,在CCKS数据集上,相较于其他的实体识别模型能取得更高的准确率、召回率和F1值,同时虽然增加了识别模型的复杂程度,但性能并没有明显的降低。

关键词: 中文医学文本, 命名实体识别, 多特征融合, 协同注意力机制, BERT模型

Abstract: Aiming at the situation that the accuracy of entity recognition cannot be effectively improved due to the lack of fusion of unique feature information of medical texts in current Chinese medical named entity recognition, and the problem that single attention mechanism affects the effectiveness of entity classification, a Chinese medical named entity recognition method based on multi-feature fusion and co-attention mechanism is proposed. Firstly, the vector representation of the original medical text is obtained by using the pre-trained model, and then the feature vectors of word granularity are obtained by using the bidirectional gated recurrent neural network (BiGRU). Secondly, combined with the distinctive radical features of medical named entities, iterative dilation convolution neural network (IDCNN) is used to extract radical-level feature vectors. Finally, the co-attention network is used to integrate medical vector features to generate double correlation features of <Characters-Radicals> pair, and then conditional random field (CRF) is used to output entity recognition results. The experimental results show that, compared with other entity recognition models, it can achieve higher accuracy, recall and F1 value on the CCKS dataset. At the same time, although the complexity of the recognition model is increased, the performance does not decrease significantly.

Key words: Chinese medical text, name entity recognition, multi-feature fusion, co-attention mechanism, bidirectional encoder representation from Transformers (BERT)