计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (5): 153-159.DOI: 10.3778/j.issn.1002-8331.1909-0211

• 模式识别与人工智能 • 上一篇    下一篇

采用Transformer-CRF的中文电子病历命名实体识别

李博,康晓东,张华丽,王亚鸽,陈亚媛,白放   

  1. 天津医科大学 医学影像学院,天津 300203
  • 出版日期:2020-03-01 发布日期:2020-03-06

Named Entity Recognition in Chinese Electronic Medical Records Using Transformer-CRF

LI Bo, KANG Xiaodong, ZHANG Huali, WANG Yage, CHEN Yayuan, BAI Fang   

  1. College of Medical Imaging, Tianjin Medical University, Tianjin 300203, China
  • Online:2020-03-01 Published:2020-03-06

摘要:

命名实体识别是自然语言处理的基本任务之一。针对中文电子病历命名实体识别传统模型识别效果不佳的问题,提出一种完全基于注意力机制的神经网络模型。实验采用自建真实中文电子病历数据集并对数据集进行人工标注、分词等预处理;对Transformer模型进行训练优化,以提取文本特征;利用条件随机场对提取到的文本特征进行分类识别。为验证所提方法的有效性,将构建的Transformer-CRF神经网络模型与其他7种传统模型进行比较研究,实验采用精确率、召回率和[F1]值三个指标评估模型的识别性能。实验结果显示,在同一语料集下,Transformer-CRF模型对身体部位类的命名实体识别效果较好,[F1]值高达95.02%;且与其他7种传统模型相比,Transformer-CRF模型的精确率、召回率和[F1]值均较高,在一定程度上验证了所构建模型具有较好的识别性能。

关键词: 电子病历(EMR), 命名实体识别, Transformer, 条件随机场(CRF)

Abstract:

Named entity recognition is one of the basic tasks of natural language processing. Aiming at the problem that the traditional model of Chinese EMR named entity recognition is not effective, a neural network model based on attention mechanism is proposed. Firstly, the experiment uses self-built real Chinese electronic medical record data sets and preprocesses the data sets by manual labeling and word segmentation. Secondly, it trains optimization of Transformer model to extract text features. Finally, it uses conditional random fields to classify and recognize the extracted text features. To verify the effectiveness of the proposed method, the Transformer-CRF neural network model is compared with seven other traditional models. The recognition performance of the model is evaluated by three indicators: precision, recall and F1 value. The experimental results show that in the same corpus, the transformer-CRF model has a better recognition effect on the named entity of Body parts, and the F1 value is as high as 95.02%, and compared with the other seven traditional models, the precision, recall and F1 value of the transformer-CRF model are higher, which proves that the model has a better recognition performance in a certain degree.

Key words: Electronic Medical Records(EMR), named entity recognition, Transformer, Conditional Random Fields(CRF)