电子病历命名实体识别研究进展

doi:10.3778/j.issn.1002-8331.2303-0237

摘要/Abstract

摘要： 电子病历命名实体识别（named entity recognition，NER）旨在识别电子病历文本中的医疗实体，并将其归为预定义的医疗实体类别，为进一步的医疗关系抽取、医疗信息检索、医疗智能问答等自然语言处理任务提供支持。系统梳理了电子病历命名实体识别的定义、标注方法、评价指标及难点；从电子病历命名实体识别难点及技术发展历程两个角度，综述了每类电子病历命名实体识别方法的优势与不足；详细梳理了国内医疗领域命名实体识别的评测任务及数据集；详细讨论和总结电子病历命名实体识别每一类难点的解决方案；总结全文并展望了医疗领域命名实体识别的发展方向。

关键词: 自然语言处理, 电子病历, 命名实体识别

Abstract: Electronic medical record named entity recognition（NER） aims to identify medical entities in electronic medical record texts, and classify them into predefined medical entity categories. It provides support for further natural language processing tasks, such as medical relationship extraction, medical information retrieval, and medical intelligent question answering, etc. Firstly, the definition, labeling methods, evaluation indicators and difficulties of named entity recognition in electronic medical records are systematically sorted out. Secondly, the advantages of each type of named entity recognition methods in electronic medical records are summarized from two perspectives：the difficulty of named entity recognition in electronic medical records and the technology development process and deficiencies. Then, the evaluation tasks and data sets of named entity recognition in the domestic medical field are sorted out in detail. Next, solutions to each type of difficulty in electronic medical record named entity recognition are discussed and summarized in detail. Finally, the full text is summarized and the medical field is prospected the development direction of named entity recognition.

Key words: natural language processing, electronic medical record, named entity recognition

刘安栋, 彭琳, 叶青, 杜建强, 程春雷, 查青林. 电子病历命名实体识别研究进展[J]. 计算机工程与应用, 2023, 59(21): 39-51.

LIU Andong, PENG Lin, YE Qing, DU Jianqiang, CHENG Chunlei, ZHA Qinglin. Advances in Named Entity Recognition in Electronic Medical Record[J]. Computer Engineering and Applications, 2023, 59(21): 39-51.

参考文献

[1] 国家卫生计生委.关于印发电子病历应用管理规范（试行）的通知[EB/OL].（2017-02-22）[2022-01-02].http：//www.nhc.gov.cn/yzygj/s3593/201702/22bb2525318f496f846e8566754
876a1.shtml.
National Health and Family Planning Commission.Notice on the issuance of electronic medical record application ma-nagement specification（for trial implementation）[EB/OL].（2017-02-22）[2022-01-02].http：//www.nhc.gov.cn/yzygj/s3593/201702/22bb2525318f496f846e8566754876a1.shtml.
[2] RAJ R J S，SHOBANA S J，PUSTOKHINA I V，et al.Optimal feature selection-based medical image classification using deep learning model in internet of medical things[J].IEEE Access，2020，8：58006-58017.
[3] PUSTOKHINA I V，PUSTOKHIN D A，GUPTA D，et al.An effective training scheme for deep neural network in edge computing enabled Internet of medical things（IoMT） systems[J].IEEE Access，2020，8：107112-107123.
[4] 李丽双，袁光辉，刘晗喆.基于位置降噪和丰富语义的电子病历实体关系抽取[J].中文信息学报，2021，35（8）：89-97.
LI L S，YUAN G H，LIU H Z.Entity relationship extraction from electronic medical records based on location noise reduction and rich semantics[J].Journal of Chinese Information Processing，2021，35（8）：89-97.
[5] YIN Y，ZHANG L，WANG Y，et al.Question answering system based on knowledge graph in traditional Chinese medicine diagnosis and treatment of viral hepatitis B[J].BioMed Research International，2022：7139904.
[6] BREUER T，KREUTZ C K，SCHAER P，et al.Bibliometric data fusion for biomedical information retrieval[J].arXiv：2304.13012，2023.
[7] 杨锦锋，于秋滨，关毅，等.电子病历命名实体识别和实体关系抽取研究综述[J].自动化学报，2014，40（8）：1537-1562.
YANG J F，YU Q B，GUAN Y，et al.An overview of research on electronic medical record oriented named entityrecognition and entity relation extraction[J].Acta Automatica Sinica，2014，40（8）：1537-1562.
[8] 吴宗友，白昆龙，杨林蕊，等.电子病历文本挖掘研究综述[J].计算机研究与发展，2021，58（3）：513-527.
WU Z Y，BAI K L，YANG L R，et al.Review on text mining of electronic medical record[J].Journal of Computer Research and Development，2021，58（3）：513-527.
[9] 吴智妍，金卫，岳路，等.电子病历命名实体识别技术研究综述[J].计算机工程与应用，2022，58（21）：13-29.
WU Z Y，JIN W，YUE L，et al.Review of research on named entity recognition technologies for electronic medical records[J].Computer Engineering and Applications，2022，58（21）：13-29.
[10] 杜晋华，尹浩，冯嵩.中文电子病历命名实体识别的研究与进展[J].电子学报，2022，50（12）：3030-3053.
DU J H，YI H，FENG S.Research and development of named entity recognition in Chinese electronic medical record[J].Acta Electronica Sinica，2022，50（12）：3030-3053.
[11] BOSE P，SRINIVASAN S，SLEEMAN IV W C，et al.A survey on recent named entity recognition and relationship extraction techniques on clinical texts[J].Applied Sciences，2021，11（18）：8319.
[12] LIU P，GUO Y，WANG F，et al.Chinese named entity recognition：the state of the art[J].Neurocomputing，2022，473：37-53.
[13] LI J，SUN A，HAN J，et al.A survey on deep learning for named entity recognition[J].IEEE Transactions on Knowledge and Data Engineering，2020，34（1）：50-70.
[14] RATINOV L，ROTH D.Design challenges and misconceptions in named entity recognition[C]//Proceedings of the Thirteenth Conference on Computational Natural Language Learning（CoNLL-2009），2009：147-155.
[15] DAI H J，LAI P T，CHANG Y C，et al.Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization[J].Journal of Cheminformatics，2015，7（1）：1-10.
[16] MAHAJAN P，RANA D.Investigating clinical named entity recognition approaches for information extraction from EMR[M]//Tracking and preventing diseases with artificial intelligence，2022：153-175.
[17] SWEENEY L.Replacing personally-identifying information in medical records，the Scrub system[C]//Proceedings of the AMIA Annual Fall Symposium.[S.l.]：American Medical Informatics Association，1996：333.
[18] WARE H，MULLETT C J，JAGANNATHAN V.Natural language processing framework to assess clinical conditions[J].Journal of the American Medical Informatics Association，2009，16（4）：585-589.
[19] SOLT I，TIKK D，GáL V，et al.Semantic classification of diseases in discharge summaries using a context-aware rule-based classifier[J].Journal of the American Medical Informatics Association，2009，16（4）：580-584.
[20] FRIEDMAN C，ALDERSON P O，AUSTIN J H M，et al.A general natural-language text processor for clinical radiology[J].Journal of the American Medical Informatics Association，1994，1（2）：161-174.
[21] CHEN X，OUYANG C，LIU Y，et al.Improving the named entity recognition of Chinese electronic medical records by combining domain dictionary and rules[J].International Journal of Environmental Research and Public Health，2020，17（8）：2687.
[22] HEARST M A，DUMAIS S T，OSUNA E，et al.Support vector machines[J].IEEE Intelligent Systems and Their Applications，1998，13（4）：18-28.
[23] MCCALLUM A，FREITAG D，PEREIRA F C N.Maximum entropy Markov models for information extraction and segmentation[C]//Proceedings of the Seventeenth International Conference on Machine Learning，2000：591-598.
[24] EDDY S R.Hidden Markov models[J].Current Opinion in Structural Biology，1996，6（3）：361-365.
[25] LAFFERTY J，MCCALLUM A，PEREIRA F C N.Conditional random fields：probabilistic models for segmenting and labeling sequence data[C]//International Conference on Machine Learning.San Francisco：Morgan Kaufmann Publishers Inc，2001：282-289.
[26] GUO Y，GAIZAUSKAS R，ROBERTS I，et al.Identifying personal health information using support vector machines[C]//i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data，2006：10-11.
[27] DOAN S，XU H.Recognizing medication related entities in hospital discharge summaries using support vector machine[C]//23rd International Conference on Computational Linguistics.[S.l.]：NIH Public Access，2010：259-266.
[28] KOLLER D，FRIEDMAN N.Probabilistic graphical models：principles and techniques[M].[S.l.]：MIT Press，2009.
[29] MAO X，LI F，WANG H，et al.Named entity recognition of electronic medical record based on improved HMM algorithm[C]//2017 International Conference on Computer Technology，Electronics and Communication（ICCTEC），2017：435-438.
[30] ZHOU G D，SHEN D，ZHANG J，et al.Recognition of protein/gene names from text using an ensemble of classifiers[J].BMC Bioinformatics，2005，6（1）：1-7.
[31] FRESKO M，ROSENFELD B，FELDMAN R.A hybrid approach to NER by MEMM and manual rules[C]//Proceedings of the 14th ACM International Conference on Information and Knowledge Management，2005：361-362.
[32] FENG X，LI Y，HANG Z，et al.TBR-NER：research on COVID-19 text information extraction based on joint learning of topic recognition and named entity recognition[J].Journal of Sensors，2022：1-15.
[33] LIU K，HU Q，LIU J，et al.Named entity recognition in Chinese electronic medical records based on CRF[C]//2017 14th Web Information Systems and Applications Conference（WISA），2017：105-110.
[34] GOODFELLOW I，BENGIO Y，COURVILLE A.Deep learning[M].[S.l.]：MIT Press，2016：326-366.
[35] DONG X，QIAN L，GUAN Y，et al.A multiclass classification method based on deep learning for named entity recognition in electronic medical records[C]//2016 New York Scientific Data Summit（NYSDS），2016：1-10.
[36] LI Z，ZHANG Q，LIU Y，et al.Recurrent neural networks with specialized word embedding for Chinese clinical named entity recognition[C]//CEUR Workshop Proceedings，2017：55-60.
[37] OUYANG E，LI Y，JIN L，et al.Exploring n-gram character presentation in bidirectional RNN-CRF for Chinese clinical named entity recognition[C]//CEUR Workshop Proceedings，2017：37-42.
[38] XIA Y，WANG Q.Clinical named entity recognition：ECUST in the CCKS-2017 shared task 2[C]//CEUR Workshop Proceedings，2017：43-48.
[39] SCHMIDHUBER J.Deep learning in neural networks：an overview[J].Neural Networks，2015，61：85-117.
[40] HU J，SHI X，LIU Z，et al.HITSZ_CNER：a hybrid system for entity recognition from Chinese clinical text[C]//CEUR Workshop Proceedings，2017：25-30.
[41] HOCHREITER S，SCHMIDHUBER J.Long short-term memory[J].Neural Computation，1997，9（8）：1735-1780.
[42] CHEN Y X，ZHANG G，FANG H Z.Clinical named entity recognition method based on CRF[C]//Proceedings of the Knowledge Graph and Semantic Computing China Conference Evaluation Task.National Conference on Knowledge Graph and Semantic Computing，Chengdu，2017：58-60.
[43] 潘璀然，王青华，汤步洲，等.基于句子级Lattice-长短记忆神经网络的中文电子病历命名实体识别[J].第二军医大学学报，2019，40（5）：497-506.
PAN C R，WANG Q H，TANG B Z，et al.Chinese electronic medical record named entity recognition based on sentence-level Lattice-long short-term memory neural network[J].Academic Journal of Naval Medical University，2019，40（5）：497-506.
[44] GRAVES A.Long short-term memory[M]//Supervised sequence labelling with recurrent neural networks，2012：37-45.
[45] DYER C，BALLESTEROS M，LING W，et al.Transition-based dependency parsing with stack long short-term memory[J].arXiv：1505.08075，2015.
[46] JI B，LIU R，LI S，et al.A hybrid approach for named entity recognition in Chinese electronic medical record[J].BMC Medical Informatics and Decision Making，2019，19（2）：149-158.
[47] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Advances in Neural Information Processing Systems，2017.
[48] GUU K，LEE K，TUNG Z，et al.Retrieval augmented language model pre-training[C]//International Conference on Machine Learning，2020：3929-3938.
[49] DEVLIN J，CHANG M W，LEE K，et al.Bert：pre-training of deep bidirectional transformers for language understanding[J].arXiv：1810.04805，2018.
[50] 晏阳天，赵新宇，吴贤.基于BERT与字形字音特征的医疗命名实体识别[C]//知识图谱与语义计算中国会议评估任务论文集，2020：1-7.
YAN Y T，ZHAO X Y，WU X.Medical named entity recognition based on BERT and glyph phonetic features[C]//Proceedings of the Knowledge Graph and Semantic Computing China Conference Evaluation Task，2020：1-7.
[51] 乔锐，杨笑然，黄文亢.基于BERT与模型融合的医疗命名实体识别[C]//知识图谱与语义计算中国会议评估任务论文集.全国知识图谱与语义计算大会，杭州，2019：1-6.
QIAO R，YANG X R，HUANG W K.Medical named entity recognition based on BERT and model fusion[C]//Proceedings of the Knowledge Graph and Semantic Computing China Conference Evaluation Task.National Conference on Knowledge Graph and Semantic Computing，Hangzhou，2019：1-6.
[52] TANG B，WANG X，YAN J，et al.Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF[J].BMC Medical Informatics and Decision Making，2019，19（3）：89-97.
[53] LIN W，JI D，LU Y.Disorder recognition in clinical texts using multi-label structured SVM[J].BMC Bioinformatics，2017，18（1）：1-11.
[54] CUI Y，CHE W，LIU T，et al.Pre-training with whole word masking for chinese bert[J].IEEE/ACM Transactions on Audio，Speech，and Language Processing，2021，29：3504-3514.
[55] ZHANG Z，HAN X，LIU Z，et al.ERNIE：enhanced language representation with informative entities[J].arXiv：1905.07129，2019.
[56] 马晓琴，郭小鹤，薛峪峰，等.针对命名实体识别的数据增强技术[J].华东师范大学学报（自然科学版），2021，219（5）：14-23.
MA X Q，GUO X H，XUE Y F，et al.Data augmentation technology for named entity recognition[J].Journal of East China Normal University（Natural Science），2021，219（5）：14-23.
[57] GUZMáN-PONCE A，SáNCHEZ J S，VALDOVINOS R M，et al.DBIG-US：a two-stage under-sampling algorithm to face the class imbalance problem[J].Expert Systems with Applications，2021，168：114301.
[58] TARAWNEH A S，HASSANAT A B，ALTARAWNEH G A，et al.Stop oversampling for class imbalance learning：a review[J].IEEE Access，2022，10：47643-47660.
[59] 罗熹，夏先运，安莹，等.结合多头自注意力机制与BiLSTM-CRF的中文临床实体识别[J].湖南大学学报（自然科学版），2021，48（4）：45-55.
LUO X，XIA X Y，AN Y，et al.Chinese CNER combined with multi-head self-attention and BiLSTM-CRF[J].Journal of Hunan University（Natural Sciences），2021，48（4）：45-55.
[60] AKKASI A，VARO?LU E，DIMILILER N.Balanced undersampling：a novel sentence-based undersampling method to improve recognition of named entities in chemical and biomedical text[J].Applied Intelligence，2018，48：1965-1978.
[61] 本妍妍，庞雪芹.融入词性的医疗命名实体识别研究[J/OL].数据分析与知识发现：1-14[2023-02-17].http：//kns.cnki.net/kcms/detail/10.1478.G2.20220726.1554.004.html.
BEN Y Y，PANG X Q.Research on medical named entity recognition with word information[J/OL].Data Analysis and Knowledge Discovery：1-14[2023-02-17].http：//kns.cnki.net/kcms/detail/10.1478.G2.20220726.1554.
004.html.
[62] YANG N，PUN S H，VAI M I，et al.A unified knowledge extraction method based on bert and handshaking tagging scheme[J].Applied Sciences，2022，12（13）：6543.
[63] NATH N，LEE S H，LEE I.NEAR：named entity and attribute recognition of clinical concepts[J].Journal of Biomedical Informatics，2022，130：104092.
[64] LI X，SUN Z，ZHU G.CCRFs-NER：named entity recognition method based on cascaded conditional random fields oriented Chinese EMR[C]//Tenth International Conference on Applications and Techniques in Cyber Intelligence（ICATCI 2022）.Cham：Springer International Publishing，2023：229-237.
[65] YANG S，TU K.Bottom-up constituency parsing and nested named entity recognition with pointer networks[J].arXiv：2110.05419，2021.
[66] SU J，MURTADHA A，PAN S，et al.Global pointer：novel efficient span-based approach for named entity recognition[J].arXiv：2208.03054，2022.
[67] CUI S，JOE I.A multi-head adjacent attention-based pyramid layered model for nested named entity recognition[J].Neural Computing and Applications，2023，35（3）：2561-2574.
[68] TANG B，HU J，WANG X，et al.Recognizing continuous and discontinuous adverse drug reaction mentions from social media using LSTM-CRF[J].Wireless Communications & Mobile Computing，2018.
[69] DIRKSON A R，VERBERNE S，KRAAIJ W，et al.FuzzyBIO：a proposal for fuzzy representation of discontinuous entities[C]//Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis.[S.l.]：Association for Computational Linguistics，2021：77-82.
[70] LU W，ROTH D.Joint mention extraction and classification with mention hypergraphs[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing，2015：857-867.
[71] FEI H，JI D，LI B，et al.Rethinking boundaries：end-to-end recognition of discontinuous mentions with pointer networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2021：12785-12793.
[72] LI F，LIN Z C，ZHANG M，et al.A span-based model for joint overlapped and discontinuous named entity recognition[J].arXiv：2106.14373，2021.
[73] LI X，YAN H，QIU X，et al.FLAT：Chinese NER using flat-lattice transformer[J].arXiv：2004.11795，2020.
[74] 姚蕾，蒋明峰，方贤，等.结合部首特征和BERT-Transformer-CRF的中文电子病历实体识别方法研究[J].软件工程，2022，25（12）：30-36.
YAO L，JIANG M F，FANG X，et al.Research on Chinese clinical named entity recognition method based on radical feature and BERT-Transformer-CRF[J].Software Engineering，2022，25（12）：30-36.
[75] ALSHAMMARI N，ALANAZI S.The impact of using different annotation schemes on named entity recognition[J].Egyptian Informatics Journal，2021，22（3）：295-302.
[76] 张汝佳，代璐，郭鹏，等.基于分割注意力与边界感知的中文嵌套命名实体识别算法[J].计算机科学，2023，50（1）：213-220.
ZHANG R J，DAI L，GUO P，et al.Chinese nested named entity recognition algorithm based on segmentation attention and boundary-aware[J].Computer Science，2023，50（1）：213-220.
[77] VUOKKO R，VAKKURI A，PALOJOKI S.Systematized nomenclature of medicine-clinical terminology（SNOMED CT） clinical use cases in the context of electronic health record systems：systematic literature review[J].JMIR Medical Informatics，2023，11：e43750.
[78] HARRISON J E，WEBER S，JAKOB R，et al.ICD-11：an international classification of diseases for the twenty-first century[J].BMC Medical Informatics and Decision Making，2021，21（6）：1-10.
[79] JIN Y，XIONG Y，SHI D，et al.Learning from undercoded clinical records for automated International Classification of Diseases（ICD） coding[J].Journal of the American Medical Informatics Association，2023，30（3）：438-446.