计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (21): 39-51.DOI: 10.3778/j.issn.1002-8331.2303-0237
刘安栋,彭琳,叶青,杜建强,程春雷,查青林
出版日期:
2023-11-01
发布日期:
2023-11-01
LIU Andong, PENG Lin, YE Qing, DU Jianqiang, CHENG Chunlei, ZHA Qinglin
Online:
2023-11-01
Published:
2023-11-01
摘要: 电子病历命名实体识别(named entity recognition,NER)旨在识别电子病历文本中的医疗实体,并将其归为预定义的医疗实体类别,为进一步的医疗关系抽取、医疗信息检索、医疗智能问答等自然语言处理任务提供支持。系统梳理了电子病历命名实体识别的定义、标注方法、评价指标及难点;从电子病历命名实体识别难点及技术发展历程两个角度,综述了每类电子病历命名实体识别方法的优势与不足;详细梳理了国内医疗领域命名实体识别的评测任务及数据集;详细讨论和总结电子病历命名实体识别每一类难点的解决方案;总结全文并展望了医疗领域命名实体识别的发展方向。
刘安栋, 彭琳, 叶青, 杜建强, 程春雷, 查青林. 电子病历命名实体识别研究进展[J]. 计算机工程与应用, 2023, 59(21): 39-51.
LIU Andong, PENG Lin, YE Qing, DU Jianqiang, CHENG Chunlei, ZHA Qinglin. Advances in Named Entity Recognition in Electronic Medical Record[J]. Computer Engineering and Applications, 2023, 59(21): 39-51.
[1] 国家卫生计生委.关于印发电子病历应用管理规范(试行)的通知[EB/OL].(2017-02-22)[2022-01-02].http://www.nhc.gov.cn/yzygj/s3593/201702/22bb2525318f496f846e8566754 876a1.shtml. National Health and Family Planning Commission.Notice on the issuance of electronic medical record application ma-nagement specification(for trial implementation)[EB/OL].(2017-02-22)[2022-01-02].http://www.nhc.gov.cn/yzygj/s3593/201702/22bb2525318f496f846e8566754876a1.shtml. [2] RAJ R J S,SHOBANA S J,PUSTOKHINA I V,et al.Optimal feature selection-based medical image classification using deep learning model in internet of medical things[J].IEEE Access,2020,8:58006-58017. [3] PUSTOKHINA I V,PUSTOKHIN D A,GUPTA D,et al.An effective training scheme for deep neural network in edge computing enabled Internet of medical things(IoMT) systems[J].IEEE Access,2020,8:107112-107123. [4] 李丽双,袁光辉,刘晗喆.基于位置降噪和丰富语义的电子病历实体关系抽取[J].中文信息学报,2021,35(8):89-97. LI L S,YUAN G H,LIU H Z.Entity relationship extraction from electronic medical records based on location noise reduction and rich semantics[J].Journal of Chinese Information Processing,2021,35(8):89-97. [5] YIN Y,ZHANG L,WANG Y,et al.Question answering system based on knowledge graph in traditional Chinese medicine diagnosis and treatment of viral hepatitis B[J].BioMed Research International,2022:7139904. [6] BREUER T,KREUTZ C K,SCHAER P,et al.Bibliometric data fusion for biomedical information retrieval[J].arXiv:2304.13012,2023. [7] 杨锦锋,于秋滨,关毅,等.电子病历命名实体识别和实体关系抽取研究综述[J].自动化学报,2014,40(8):1537-1562. YANG J F,YU Q B,GUAN Y,et al.An overview of research on electronic medical record oriented named entityrecognition and entity relation extraction[J].Acta Automatica Sinica,2014,40(8):1537-1562. [8] 吴宗友,白昆龙,杨林蕊,等.电子病历文本挖掘研究综述[J].计算机研究与发展,2021,58(3):513-527. WU Z Y,BAI K L,YANG L R,et al.Review on text mining of electronic medical record[J].Journal of Computer Research and Development,2021,58(3):513-527. [9] 吴智妍,金卫,岳路,等.电子病历命名实体识别技术研究综述[J].计算机工程与应用,2022,58(21):13-29. WU Z Y,JIN W,YUE L,et al.Review of research on named entity recognition technologies for electronic medical records[J].Computer Engineering and Applications,2022,58(21):13-29. [10] 杜晋华,尹浩,冯嵩.中文电子病历命名实体识别的研究与进展[J].电子学报,2022,50(12):3030-3053. DU J H,YI H,FENG S.Research and development of named entity recognition in Chinese electronic medical record[J].Acta Electronica Sinica,2022,50(12):3030-3053. [11] BOSE P,SRINIVASAN S,SLEEMAN IV W C,et al.A survey on recent named entity recognition and relationship extraction techniques on clinical texts[J].Applied Sciences,2021,11(18):8319. [12] LIU P,GUO Y,WANG F,et al.Chinese named entity recognition:the state of the art[J].Neurocomputing,2022,473:37-53. [13] LI J,SUN A,HAN J,et al.A survey on deep learning for named entity recognition[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):50-70. [14] RATINOV L,ROTH D.Design challenges and misconceptions in named entity recognition[C]//Proceedings of the Thirteenth Conference on Computational Natural Language Learning(CoNLL-2009),2009:147-155. [15] DAI H J,LAI P T,CHANG Y C,et al.Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization[J].Journal of Cheminformatics,2015,7(1):1-10. [16] MAHAJAN P,RANA D.Investigating clinical named entity recognition approaches for information extraction from EMR[M]//Tracking and preventing diseases with artificial intelligence,2022:153-175. [17] SWEENEY L.Replacing personally-identifying information in medical records,the Scrub system[C]//Proceedings of the AMIA Annual Fall Symposium.[S.l.]:American Medical Informatics Association,1996:333. [18] WARE H,MULLETT C J,JAGANNATHAN V.Natural language processing framework to assess clinical conditions[J].Journal of the American Medical Informatics Association,2009,16(4):585-589. [19] SOLT I,TIKK D,GáL V,et al.Semantic classification of diseases in discharge summaries using a context-aware rule-based classifier[J].Journal of the American Medical Informatics Association,2009,16(4):580-584. [20] FRIEDMAN C,ALDERSON P O,AUSTIN J H M,et al.A general natural-language text processor for clinical radiology[J].Journal of the American Medical Informatics Association,1994,1(2):161-174. [21] CHEN X,OUYANG C,LIU Y,et al.Improving the named entity recognition of Chinese electronic medical records by combining domain dictionary and rules[J].International Journal of Environmental Research and Public Health,2020,17(8):2687. [22] HEARST M A,DUMAIS S T,OSUNA E,et al.Support vector machines[J].IEEE Intelligent Systems and Their Applications,1998,13(4):18-28. [23] MCCALLUM A,FREITAG D,PEREIRA F C N.Maximum entropy Markov models for information extraction and segmentation[C]//Proceedings of the Seventeenth International Conference on Machine Learning,2000:591-598. [24] EDDY S R.Hidden Markov models[J].Current Opinion in Structural Biology,1996,6(3):361-365. [25] LAFFERTY J,MCCALLUM A,PEREIRA F C N.Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//International Conference on Machine Learning.San Francisco:Morgan Kaufmann Publishers Inc,2001:282-289. [26] GUO Y,GAIZAUSKAS R,ROBERTS I,et al.Identifying personal health information using support vector machines[C]//i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data,2006:10-11. [27] DOAN S,XU H.Recognizing medication related entities in hospital discharge summaries using support vector machine[C]//23rd International Conference on Computational Linguistics.[S.l.]:NIH Public Access,2010:259-266. [28] KOLLER D,FRIEDMAN N.Probabilistic graphical models:principles and techniques[M].[S.l.]:MIT Press,2009. [29] MAO X,LI F,WANG H,et al.Named entity recognition of electronic medical record based on improved HMM algorithm[C]//2017 International Conference on Computer Technology,Electronics and Communication(ICCTEC),2017:435-438. [30] ZHOU G D,SHEN D,ZHANG J,et al.Recognition of protein/gene names from text using an ensemble of classifiers[J].BMC Bioinformatics,2005,6(1):1-7. [31] FRESKO M,ROSENFELD B,FELDMAN R.A hybrid approach to NER by MEMM and manual rules[C]//Proceedings of the 14th ACM International Conference on Information and Knowledge Management,2005:361-362. [32] FENG X,LI Y,HANG Z,et al.TBR-NER:research on COVID-19 text information extraction based on joint learning of topic recognition and named entity recognition[J].Journal of Sensors,2022:1-15. [33] LIU K,HU Q,LIU J,et al.Named entity recognition in Chinese electronic medical records based on CRF[C]//2017 14th Web Information Systems and Applications Conference(WISA),2017:105-110. [34] GOODFELLOW I,BENGIO Y,COURVILLE A.Deep learning[M].[S.l.]:MIT Press,2016:326-366. [35] DONG X,QIAN L,GUAN Y,et al.A multiclass classification method based on deep learning for named entity recognition in electronic medical records[C]//2016 New York Scientific Data Summit(NYSDS),2016:1-10. [36] LI Z,ZHANG Q,LIU Y,et al.Recurrent neural networks with specialized word embedding for Chinese clinical named entity recognition[C]//CEUR Workshop Proceedings,2017:55-60. [37] OUYANG E,LI Y,JIN L,et al.Exploring n-gram character presentation in bidirectional RNN-CRF for Chinese clinical named entity recognition[C]//CEUR Workshop Proceedings,2017:37-42. [38] XIA Y,WANG Q.Clinical named entity recognition:ECUST in the CCKS-2017 shared task 2[C]//CEUR Workshop Proceedings,2017:43-48. [39] SCHMIDHUBER J.Deep learning in neural networks:an overview[J].Neural Networks,2015,61:85-117. [40] HU J,SHI X,LIU Z,et al.HITSZ_CNER:a hybrid system for entity recognition from Chinese clinical text[C]//CEUR Workshop Proceedings,2017:25-30. [41] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [42] CHEN Y X,ZHANG G,FANG H Z.Clinical named entity recognition method based on CRF[C]//Proceedings of the Knowledge Graph and Semantic Computing China Conference Evaluation Task.National Conference on Knowledge Graph and Semantic Computing,Chengdu,2017:58-60. [43] 潘璀然,王青华,汤步洲,等.基于句子级Lattice-长短记忆神经网络的中文电子病历命名实体识别[J].第二军医大学学报,2019,40(5):497-506. PAN C R,WANG Q H,TANG B Z,et al.Chinese electronic medical record named entity recognition based on sentence-level Lattice-long short-term memory neural network[J].Academic Journal of Naval Medical University,2019,40(5):497-506. [44] GRAVES A.Long short-term memory[M]//Supervised sequence labelling with recurrent neural networks,2012:37-45. [45] DYER C,BALLESTEROS M,LING W,et al.Transition-based dependency parsing with stack long short-term memory[J].arXiv:1505.08075,2015. [46] JI B,LIU R,LI S,et al.A hybrid approach for named entity recognition in Chinese electronic medical record[J].BMC Medical Informatics and Decision Making,2019,19(2):149-158. [47] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017. [48] GUU K,LEE K,TUNG Z,et al.Retrieval augmented language model pre-training[C]//International Conference on Machine Learning,2020:3929-3938. [49] DEVLIN J,CHANG M W,LEE K,et al.Bert:pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [50] 晏阳天,赵新宇,吴贤.基于BERT与字形字音特征的医疗命名实体识别[C]//知识图谱与语义计算中国会议评估任务论文集,2020:1-7. YAN Y T,ZHAO X Y,WU X.Medical named entity recognition based on BERT and glyph phonetic features[C]//Proceedings of the Knowledge Graph and Semantic Computing China Conference Evaluation Task,2020:1-7. [51] 乔锐,杨笑然,黄文亢.基于BERT与模型融合的医疗命名实体识别[C]//知识图谱与语义计算中国会议评估任务论文集.全国知识图谱与语义计算大会,杭州,2019:1-6. QIAO R,YANG X R,HUANG W K.Medical named entity recognition based on BERT and model fusion[C]//Proceedings of the Knowledge Graph and Semantic Computing China Conference Evaluation Task.National Conference on Knowledge Graph and Semantic Computing,Hangzhou,2019:1-6. [52] TANG B,WANG X,YAN J,et al.Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF[J].BMC Medical Informatics and Decision Making,2019,19(3):89-97. [53] LIN W,JI D,LU Y.Disorder recognition in clinical texts using multi-label structured SVM[J].BMC Bioinformatics,2017,18(1):1-11. [54] CUI Y,CHE W,LIU T,et al.Pre-training with whole word masking for chinese bert[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514. [55] ZHANG Z,HAN X,LIU Z,et al.ERNIE:enhanced language representation with informative entities[J].arXiv:1905.07129,2019. [56] 马晓琴,郭小鹤,薛峪峰,等.针对命名实体识别的数据增强技术[J].华东师范大学学报(自然科学版),2021,219(5):14-23. MA X Q,GUO X H,XUE Y F,et al.Data augmentation technology for named entity recognition[J].Journal of East China Normal University(Natural Science),2021,219(5):14-23. [57] GUZMáN-PONCE A,SáNCHEZ J S,VALDOVINOS R M,et al.DBIG-US:a two-stage under-sampling algorithm to face the class imbalance problem[J].Expert Systems with Applications,2021,168:114301. [58] TARAWNEH A S,HASSANAT A B,ALTARAWNEH G A,et al.Stop oversampling for class imbalance learning:a review[J].IEEE Access,2022,10:47643-47660. [59] 罗熹,夏先运,安莹,等.结合多头自注意力机制与BiLSTM-CRF的中文临床实体识别[J].湖南大学学报(自然科学版),2021,48(4):45-55. LUO X,XIA X Y,AN Y,et al.Chinese CNER combined with multi-head self-attention and BiLSTM-CRF[J].Journal of Hunan University(Natural Sciences),2021,48(4):45-55. [60] AKKASI A,VARO?LU E,DIMILILER N.Balanced undersampling:a novel sentence-based undersampling method to improve recognition of named entities in chemical and biomedical text[J].Applied Intelligence,2018,48:1965-1978. [61] 本妍妍,庞雪芹.融入词性的医疗命名实体识别研究[J/OL].数据分析与知识发现:1-14[2023-02-17].http://kns.cnki.net/kcms/detail/10.1478.G2.20220726.1554.004.html. BEN Y Y,PANG X Q.Research on medical named entity recognition with word information[J/OL].Data Analysis and Knowledge Discovery:1-14[2023-02-17].http://kns.cnki.net/kcms/detail/10.1478.G2.20220726.1554. 004.html. [62] YANG N,PUN S H,VAI M I,et al.A unified knowledge extraction method based on bert and handshaking tagging scheme[J].Applied Sciences,2022,12(13):6543. [63] NATH N,LEE S H,LEE I.NEAR:named entity and attribute recognition of clinical concepts[J].Journal of Biomedical Informatics,2022,130:104092. [64] LI X,SUN Z,ZHU G.CCRFs-NER:named entity recognition method based on cascaded conditional random fields oriented Chinese EMR[C]//Tenth International Conference on Applications and Techniques in Cyber Intelligence(ICATCI 2022).Cham:Springer International Publishing,2023:229-237. [65] YANG S,TU K.Bottom-up constituency parsing and nested named entity recognition with pointer networks[J].arXiv:2110.05419,2021. [66] SU J,MURTADHA A,PAN S,et al.Global pointer:novel efficient span-based approach for named entity recognition[J].arXiv:2208.03054,2022. [67] CUI S,JOE I.A multi-head adjacent attention-based pyramid layered model for nested named entity recognition[J].Neural Computing and Applications,2023,35(3):2561-2574. [68] TANG B,HU J,WANG X,et al.Recognizing continuous and discontinuous adverse drug reaction mentions from social media using LSTM-CRF[J].Wireless Communications & Mobile Computing,2018. [69] DIRKSON A R,VERBERNE S,KRAAIJ W,et al.FuzzyBIO:a proposal for fuzzy representation of discontinuous entities[C]//Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis.[S.l.]:Association for Computational Linguistics,2021:77-82. [70] LU W,ROTH D.Joint mention extraction and classification with mention hypergraphs[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing,2015:857-867. [71] FEI H,JI D,LI B,et al.Rethinking boundaries:end-to-end recognition of discontinuous mentions with pointer networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021:12785-12793. [72] LI F,LIN Z C,ZHANG M,et al.A span-based model for joint overlapped and discontinuous named entity recognition[J].arXiv:2106.14373,2021. [73] LI X,YAN H,QIU X,et al.FLAT:Chinese NER using flat-lattice transformer[J].arXiv:2004.11795,2020. [74] 姚蕾,蒋明峰,方贤,等.结合部首特征和BERT-Transformer-CRF的中文电子病历实体识别方法研究[J].软件工程,2022,25(12):30-36. YAO L,JIANG M F,FANG X,et al.Research on Chinese clinical named entity recognition method based on radical feature and BERT-Transformer-CRF[J].Software Engineering,2022,25(12):30-36. [75] ALSHAMMARI N,ALANAZI S.The impact of using different annotation schemes on named entity recognition[J].Egyptian Informatics Journal,2021,22(3):295-302. [76] 张汝佳,代璐,郭鹏,等.基于分割注意力与边界感知的中文嵌套命名实体识别算法[J].计算机科学,2023,50(1):213-220. ZHANG R J,DAI L,GUO P,et al.Chinese nested named entity recognition algorithm based on segmentation attention and boundary-aware[J].Computer Science,2023,50(1):213-220. [77] VUOKKO R,VAKKURI A,PALOJOKI S.Systematized nomenclature of medicine-clinical terminology(SNOMED CT) clinical use cases in the context of electronic health record systems:systematic literature review[J].JMIR Medical Informatics,2023,11:e43750. [78] HARRISON J E,WEBER S,JAKOB R,et al.ICD-11:an international classification of diseases for the twenty-first century[J].BMC Medical Informatics and Decision Making,2021,21(6):1-10. [79] JIN Y,XIONG Y,SHI D,et al.Learning from undercoded clinical records for automated International Classification of Diseases(ICD) coding[J].Journal of the American Medical Informatics Association,2023,30(3):438-446. |
[1] | 赵萍, 窦全胜, 唐焕玲, 姜平, 陈淑振. 融合词信息嵌入的注意力自适应命名实体识别[J]. 计算机工程与应用, 2023, 59(8): 167-174. |
[2] | 肖立中, 臧中兴, 宋赛赛. 融合自注意力的关系抽取级联标记框架研究[J]. 计算机工程与应用, 2023, 59(3): 77-83. |
[3] | 廖春林, 张宏军, 廖湘琳, 程恺, 李大硕, 王航. 开源自然语言处理工具综述[J]. 计算机工程与应用, 2023, 59(22): 36-56. |
[4] | 何晨, 苑迎春, 王克俭, 陶佳. 高校学业文本命名实体识别及数据集构建研究[J]. 计算机工程与应用, 2023, 59(22): 322-328. |
[5] | 王文涛, 奚雪峰, 崔志明, 徐川. 地名实体识别研究与展望[J]. 计算机工程与应用, 2023, 59(21): 66-82. |
[6] | 林令德, 刘纳, 王正安. Adapter与Prompt Tuning微调方法研究综述[J]. 计算机工程与应用, 2023, 59(2): 12-21. |
[7] | 景丽, 姚克. 融合知识图谱和多模态的文本分类研究[J]. 计算机工程与应用, 2023, 59(2): 102-109. |
[8] | 刘泽旖, 余文华, 洪智勇, 柯冠舟, 谭荣杰. 基于问题回答模式的中文事件抽取[J]. 计算机工程与应用, 2023, 59(2): 153-160. |
[9] | 米健霞, 谢红薇. 面向招标物料的命名实体识别研究及应用[J]. 计算机工程与应用, 2023, 59(2): 314-320. |
[10] | 沈希宇, 蔡肖红, 曹慧. 融合医疗知识图谱的推荐系统研究进展[J]. 计算机工程与应用, 2023, 59(19): 40-51. |
[11] | 束文豪, 奚雪峰, 崔志明, 顾晨凯. 图神经网络在命名实体识别中的应用研究[J]. 计算机工程与应用, 2023, 59(19): 52-65. |
[12] | 易钧汇, 查青林. 中医症状信息抽取研究综述[J]. 计算机工程与应用, 2023, 59(17): 35-47. |
[13] | 胡杭乐, 程春雷, 叶青, 彭琳, 沈友志. 开放信息抽取研究综述[J]. 计算机工程与应用, 2023, 59(16): 31-49. |
[14] | 王辰, 李明, 马金刚. 电子病历关系抽取综述[J]. 计算机工程与应用, 2023, 59(16): 63-73. |
[15] | 袁子博, 姚涛, 闫连山. 基于命名实体识别的违法广告词检测方法[J]. 计算机工程与应用, 2023, 59(15): 141-150. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||