计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (23): 136-144.DOI: 10.3778/j.issn.1002-8331.2207-0455

• 模式识别与人工智能 • 上一篇    下一篇

多特征融合的中文电子病历命名实体识别

孙振,李新福   

  1. 河北大学 网络空间安全与计算机学院,河北 保定 071000
  • 出版日期:2023-12-01 发布日期:2023-12-01

Named Entity Recognition of Chinese Electronic Medical Records Based on Multi-Feature Fusion

SUN Zhen, LI Xinfu   

  1. College of Cyberspace Security and Computer, Hebei University, Baoding,Hebei 071000, China
  • Online:2023-12-01 Published:2023-12-01

摘要: 命名实体识别是自然语言处理中的基本任务。目前中文电子病历命名实体识别研究没有考虑到医疗文本结构复杂、数据集实体类型分布不均衡的情况,仅将通用领域的命名实体识别模型迁移到医疗领域,识别效果不佳。针对以上问题,提出多特征融合的中文电子病历命名实体识别模型。分别获取字、部首和四角向量,通过汉字字形丰富医疗文本的语义表示;利用实体标签标记策略,将向量中可能存在的实体类型进行标记,加强模型对不同类型文本数据的学习;将融合向量送入到Mogrifier GRU层,进一步加强特征表示语义间的联系,并利用CRF建立标签约束。实验表明,所提模型在CCKS2019数据集上的F1值达到88.72%,在MSRA数据集上达到95.44%,验证了模型的有效性。

关键词: 电子病历, 命名实体识别, 多特征, 门控循环单元(GRU)

Abstract: Named entity recognition is a basic task in natural language processing. Currently, the research on named entity recognition in Chinese electronic medical records does not consider the complex structure of medical texts and the uneven distribution of entity types in data sets. It only migrates the named entity recognition model from the general field to the medical field, and the recognition effect is not good. Aiming at the above problems, this paper proposes a multi-feature fusion named entity recognition model for Chinese electronic medical records. Firstly, the characters, radicals, and quadrilateral vectors are obtained to enrich the semantic representation of medical texts through Chinese characters. Secondly, the entity label labeling strategy is used to label the entity types in the vector to enhance the model’s learning of different text data types. Finally, the fusion vector is fed into the Mogrifier GRU layer to strengthen the relationship between feature representation semantics further, and CRF is used to establish label constraints. The experimental results show that the F1 value of the proposed model reaches 88.72% on the CCKS2019 dataset and 95.44% on the MSRA dataset, which verifies the effectiveness of the model.

Key words: electronic medical records, named entity recognition, multi-feature, gated recurrent unit(GRU)