计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (21): 13-29.DOI: 10.3778/j.issn.1002-8331.2204-0272

• 热点与综述 • 上一篇    下一篇

电子病历命名实体识别技术研究综述

吴智妍,金卫,岳路,生慧   

  1. 山东中医药大学 智能与信息工程学院,济南 250355
  • 出版日期:2022-11-01 发布日期:2022-11-01

Review of Research on Named Entity Recognition Technologies for Electronic Medical Records

WU Zhiyan, JIN Wei, YUE Lu, SHENG Hui   

  1. College of Intelligence and Information Engineering, Shandong University of Traditional Chinese Medicine, Jinan 250355, China
  • Online:2022-11-01 Published:2022-11-01

摘要: 电子病历(EMR)是医疗信息快速发展的产物,目前以非结构化文本形式存储。通过使用自然语言处理(NLP)技术,在非结构化文本中提取出大量医学实体,将有助于提升医务人员查阅病历效率,同时识别的成果也将辅助于接下来的关系提取和知识图谱构建等研究。介绍常用的若干个数据集、语料标注标准和评价指标。从早期传统方法、深度学习方法、预训练模型、小样本问题处理四个方面详细阐述电子病历命名实体识别方法,对比分析各模型自身的优势及局限性。探讨了目前研究的不足,并对未来发展方向提出展望。

关键词: 电子病历, 自然语言处理, 命名实体识别, 深度学习

Abstract: Electronic medical records(EMR) are a product of the rapid development of medical information and are currently stored in the form of unstructured text. By using natural language processing(NLP) techniques to extract a large number of medical entities in unstructured text, it will help to improve the efficiency of medical personnel in accessing medical records, while the results of identification will also assist in the next research such as relationship extraction and knowledge graph construction. This paper introduces several commonly used datasets, corpus annotation criteria and evaluation metrics. This paper elaborates on the named entity recognition methods of electronic medical records from four aspects:early traditional methods, deep learning methods, pre-trained model, and small sample problem processing, and compares and analyzes the advantages and limitations of each model itself. The shortcomings of the current research are discussed, and the future development direction is proposed.

Key words: electronic medical records(EMR), natural language processing(NLP), named entity identification, deep learning