计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (9): 48-64.DOI: 10.3778/j.issn.1002-8331.2309-0406
彭琳,宋珺,熊玲珠,杜建强,叶青,刘安栋
出版日期:
2024-05-01
发布日期:
2024-04-29
PENG Lin, SONG Jun, XIONG Lingzhu, DU jianqiang, YE Qing, LIU Andong
Online:
2024-05-01
Published:
2024-04-29
摘要: 医学领域知识融合旨在将分散在各个知识图谱或不同数据源中的医学知识进行整合,形成一个更全面的知识图谱,在提高知识质量、扩大规模、提高医学知识利用率和共享性等方面具有促进作用。围绕知识融合的问题和解决方案,首先系统地梳理了医学领域知识融合的定义、评价指标及数据集;分类讨论了知识融合过程中存在的问题与挑战;然后从问题、技术两个维度,综述了目前知识融合中实体对齐、实体链接任务各方法的优势与不足;详细讨论和总结了医学领域知识融合每一类问题的相关解决方案;最后,总结并展望了医学领域知识融合的发展方向。
彭琳, 宋珺, 熊玲珠, 杜建强, 叶青, 刘安栋. 医学领域知识融合研究进展[J]. 计算机工程与应用, 2024, 60(9): 48-64.
PENG Lin, SONG Jun, XIONG Lingzhu, DU jianqiang, YE Qing, LIU Andong. Advances in Knowledge Fusion Research in Medical Domain[J]. Computer Engineering and Applications, 2024, 60(9): 48-64.
[1] 陈华钧. 知识图谱导论[M]. 北京: 电子工业出版, 2021: 151-166. CHEN H J. Introduction to knowledge graph[M]. Beijing: Publishing House of Electronics Industry, 2021: 151-166. [2] FRENCH E, MCINNES B T. An overview of biomedical entity linking throughout the years[J]. Journal of Biomedical Informatics, 2022, 137: 104252. [3] SHI J, YUAN Z, GUO W, et al. Knowledge-graph-enabled biomedical entity linking: a survey[J]. World Wide Web, 2023, 26(5): 2593-2622. [4] HOGAN A, BLOMQVIST E, COCHEZ M, et al. Knowledge graphs[J]. ACM Computing Surveys, 2021, 54(4): 1-37. [5] SUN Z, HU W, LI C. Cross-lingual entity alignment via joint attribute-preserving embedding[C]//Proceedings of the 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017: 628-644. [6] SUN Z, ZHANG Q, HU W, et al. A benchmarking study of embedding-based entity alignment for knowledge graphs[J]. Proceedings of the VLDB Endowment, 2020, 13(11): 2326-2340. [7] ZHANG Z, CHEN J, CHEN X, et al. An industry evaluation of embedding-based entity alignment[J]. arXiv:2010.11522, 2020. [8] DO?AN R I, LEAMAN R, LU Z. NCBI disease corpus: a resource for disease name recognition and concept normalization[J]. Journal of Biomedical Informatics, 2014, 47: 1-10. [9] BASALDELLA M, LIU F, SHAREGHI E, et al. COMETA: a corpus for medical entity linking in the social media[J]. arXiv:2010.03295, 2020. [10] DONNELLY K. SNOMED-CT: the advanced terminology and coding system for eHealth[J]. Studies in Health Technology and Informatics, 2006, 121: 279. [11] TWIGG S R F, HUFNAGEL R B, MILLER K A, et al. A recurrent mosaic mutation in SMO, encoding the hedgehog signal transducer smoothened, is the major cause of Curry-Jones syndrome[J]. American Journal of Human Genetics, 2016, 98(6): 1256-1265. [12] 王明强. 中医古籍不孕症知识图谱的构建、挖掘与应用研究[D]. 北京: 中国中医科学院, 2022. WANG M Q. Research on the construction, knowledge mining and applicaticn of infertility knowledge graph in ancient books of traditional chinese medicine[D]. Beijing: China Academy of Chinese Medical Sciences, 2022. [13] 翟东升, 娄莹, 阚慧敏, 等. 基于多源异构数据的中医药知识图谱构建与应用研究[J]. 数据分析与知识发现, 2023, 7(9): 146-158. ZHAI D S, LOU Y, KAN H M, et al. Constructing tcm knowledge graph with multi-source heterogeneous data[J]. Data Analysis and Knowledge Discovery, 2023, 7(9): 146-158. [14] 胡正银, 刘蕾蕾, 代冰, 等. 基于领域知识图谱的生命医学学科知识发现探析[J]. 数据分析与知识发现, 2020, 4(11): 1-14. HU Z Y, LIU L L, DAI B, et al. Discovering subject knowledge in life and medical sciences with knowledge graph[J]. Data Analysis and Knowledge Discovery, 2020, 4(11): 1-14. [15] BODENREIDER O. The unified medical language system (UMLS): integrating biomedical terminology[J]. Nucleic Acids Research, 2004, 32(S1): 267-270. [16] 刘道文, 阮彤, 张晨童, 等. 基于多源知识图谱融合的智能导诊算法[J]. 中文信息学报, 2021, 35(1): 125-134. LIU D W, RUAN T, ZHANG C T, et al. Clinical departments recommendation by fusing knowledge graphs from electronic healthcare records and medical websites[J]. Journal of Chinese Information Processing, 2021, 35(1): 125-134. [17] HARRISON J E, WEBER S, JAKOB R, et al. ICD-11: an international classification of diseases for the twenty-first century[J]. BMC Medical Informatics And Decision Making, 2021, 21(6): 1-10. [18] GONG F, CHEN Y, WANG H, et al. On building a diabetes centric knowledge base via mining the Web[J]. BMC Medical Informatics and Decision Making, 2019, 19: 185-197. [19] AN B. Construction and application of Chinese breast cancer knowledge graph based on multi-source heterogeneous data[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6776-6799. [20] 宋文欣. 面向医疗领域的实体对齐研究[D]. 哈尔滨: 哈尔滨工业大学, 2018. SONG W X. Research on entity alignment for the medical field[D]. Harbin: Harbin Institute of Technology, 2018. [21] 蔡娇. 基于遗传病领域的实体对齐研究[D]. 苏州: 苏州大学, 2020. CAI J. Research on entity alignment based on genetic diseases[D]. Suzhou: Soochow University, 2020. [22] BORDES A, USUNIER N, GARCIA-DURAN A, et al. Translating embeddings for modeling multi-relational data[C]//Advances in Neural Information Processing Systems, 2013. [23] SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks, 2008, 20(1): 61-80. [24] 孙倩南. 面向呼吸科室疾病的知识抽取与对齐[D]. 哈尔滨: 哈尔滨工业大学, 2019. SUN Q N. Knowledge extraction and alignment for respiratory diseases[D]. Harbin: Harbin Institute of Technology, 2019. [25] XIANG Y, ZHANG Z, CHEN J, et al. OntoEA: ontology-guided entity alignment via joint knowledge graph embedding[J]. arXiv:2105.07688, 2021. [26] 徐有为, 张宏军, 程恺, 等. 知识图谱嵌入研究综述[J]. 计算机工程与应用, 2022, 58(9): 30-50. XU Y W, ZHANG H J, CHENG K, et al. Comprehensive survey on knowledge graph embedding[J]. Computer Engineering and Applications, 2022, 58(9): 30-50. [27] LIN Y, LIU Z, SUN M, et al. Learning entity and relation embeddings for knowledge graph completion[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2015. [28] XIE R, LIU Z, JIA J, et al. Representation learning of knowledge graphs with entity descriptions[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016. [29] FANG A, LOU P, HU J, et al. Head and tail entity fusion model in medical knowledge graph construction: case study for pituitary adenoma[J]. JMIR Medical Informatics, 2021, 9(7): e28218. [30] LIN Y, LIU Z, LUAN H, et al. Modeling relation paths for representation learning of knowledge bases[J]. arXiv:1506. 00379, 2015. [31] 黄贺瑄, 王晓燕, 顾正位, 等. 医学知识图谱构建技术及发展现状研究[J]. 计算机工程与应用, 2023, 59(13): 33-48. HUANG H X, WANG X Y, GU Z W, et al. Research on construction technology and development status of medical knowledge graph[J]. Computer Engineering and Applications, 2023, 59(13): 33-48. [32] 程瑞. 面向中文医疗知识图谱的实体对齐方法研究及应用[D]. 北京: 北京邮电大学, 2020. CHENG R. Research and application of entity alignment methods for chinese medical knowledge graph[D]. Beijing: Beijing University of Posts and Telecommunications, 2020. [33] WANG Z, LV Q, LAN X, et al. Cross-lingual knowledge graph alignment via graph convolutional networks[C]//Proceedings of the 2018 Conference on Empirical Methods In Natural Language Processing, 2018: 349-357. [34] ZHANG J, ZHANG Z, ZHANG H, et al. From electronic health records to terminology base: a novel knowledge base enrichment approach[J]. Journal of Biomedical Informatics, 2021, 113: 103628. [35] VELICKOVIC P, CUCURULL G, CASANOVA A, et al. Graph attention networks[C]//Proceedings of the 6th International Conference on Learning Representation, 2018. [36] 廖开际, 王莹. 基于MuGNN模型的互联网医疗知识融合研究[J]. 河南科学, 2021, 39(12): 2014-2022. LIAO K J, WANG Y. Research on internet medical knowledge fusion based on MuGNN model[J]. Henan Science, 2021, 39(12): 2014-2022. [37] 邬萌. 基于医疗领域的知识融合方法研究与实现[D]. 成都: 西南交通大学, 2021. ?WU M. Research and implementation of knowledge fusion method for medical field[D]. Chengdu: Southwest Jiaotong University, 2021. [38] 李丽双, 董姜媛. 一种基于中文电子病历知识图谱的实体对齐方法: 中国, 202210413638. 2[P]. 2022-07-08. LI L S, DONG J Y. A research on entity alignment based on knowledge graph of chinese electronic medical record: China, 202210413638.2[P]. 2022-07-08. [39]? DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018. [40] 刘旭利, 金季豪, 阮彤, 等. 面向临床科研的医疗事件模型与开放数据集合构建[J]. 中文信息学报, 2020, 34(11): 37-48. LIU X L, JIN J H, RUAN T, et al. Construction of an open dataset for clinical event graph[J]. Journal of Chinese Information Processing, 2020, 34(11): 37-48. [41] MA Z, ZHAO L, LI J, et al. SiBERT: a siamese-based BERT network for chinese medical entities alignment[J]. Methods, 2022, 205: 133-139. [42] TANG J, ZHAO K, LI J. A fused gromov-wasserstein framework for unsupervised knowledge graph entity alignment[J]. arXiv:2305.06574, 2023. [43] LUO S, YU S. An accurate unsupervised method for joint entity alignment and dangling entity detection[C]//Findings of the Association for Computational Linguistics (ACL 2022), 2022: 2330-2339. [44] XU J, LI Y, XIE X, et al. Investigating graph structure information for entity alignment with dangling cases[J]. arXiv:2304.04718, 2023. [45] LI C, RAO Z, ZHENG Q, et al. A set of domain rules and a deep network for protein coreference resolution[J]. Database: the Journal of Biological Databases and Curation, 2018: bay065. [46] XIU X, QIAN Q, WU S. Construction of a digestive system tumor knowledge graph based on chinese electronic medical records: development and usability study[J]. JMIR Medical Informatics, 2020, 8(10): e18287. [47] LIU F, LIU M, LI M, et al. Automatic knowledge extraction from chinese electronic medical records and rheumatoid arthritis knowledge graph construction[J]. Quantitative Imaging in Medicine and Surgery, 2023, 13(6): 3873. [48] SUN H, XIAO J, ZHU W, et al. Medical knowledge graph to enhance fraud, waste, and abuse detection on claim data: model development and performance evaluation[J]. JMIR Medical Informatics, 2020, 8(7): e17653. [49] LEAMAN R, ISLAMAJ DO?AN R, LU Z. DNorm: disease name normalization with pairwise learning to rank[J]. Bioinformatics, 2013, 29(22): 2909-2917. [50] WU Y, DENNY J C, TRENT ROSENBLOOM S, et al. A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD)[J]. Journal of the American Medical Informatics Association, 2017, 24: 79-86. [51] ANGELL R, MONATH N, MOHAN S, et al. Clustering-based inference for biomedical entity linking[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021: 2598-2608. [52] DUQUE A, STEVENSON M, MARTINEZ-ROMO J, et al. Co-occurrence graphs for word sense disambiguation in the biomedical domain[J]. Artificial Intelligence in Medicine, 2018, 87: 9-19. [53] RUAS P, LAMURIAS A, COUTO F M. Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature[J]. Journal of Cheminformatics, 2020, 12(1): 1-11. [54] GOODFELLOW I, BENGIO Y, COURVILLE A. Deep learning[M]. Cambridge: MIT Press, 2016: 326-366. [55] LUO Y, SONG G, LI P, et al. Multi-task medical concept normalization using multi-view convolutional neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018 : 5868-5875. [56] SCHMIDHUBER J. Deep learning in neural networks: An overview[J]. Neural Networks, 2015, 61: 85-117. [57] SHAO D, ZHENG N, YANG Z, et al. Domain-specific Chinese word segmentation based on Bi-directional long-short term memory model[J]. IEEE Access, 2019, 7: 12993-13002. [58] YAN C, ZHANG Y, LIU K, et al. Enhancing unsupervised medical entity linking with multi-instance learning[J]. BMC Medical Informatics and Decision Making, 2021, 21: 317. [59] ZHAO S, LIU T, ZHAO S, et al. A neural multi-task learning framework to jointly model medical named entity recognition and normalization[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 817-824. [60] LIU F, SHAREGHI E, MENG Z, et al. Self-alignment pretraining for biomedical entity representations[J]. arXiv:2010. 11784, 2020. [61] LIU F, VULI? I, KORHONEN A, et al. Learning domain-specialised representations for cross-lingual biomedical entity linking[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2021: 565-574. [62] DONG H, CHEN J, HE Y, et al. Reveal the unknown: out-of-knowledge-base mention discovery with entity linking[J]. arXiv:2302.07189, 2023. [63] LI L, ZHAI Y, GAO J, et al. Stacking-BERT model for chinese medical procedure entity normalization[J]. Mathematical Biosciences and Engineering, 2023, 20(1): 1018-1036. [64] CHEN L, VAROQUAUX G, SUCHANEK F M. A lightweight neural model for biomedical entity linking[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 12657-12665. [65] LAI T, JI H, ZHAI C X. BERT might be overkill: a tiny but effective biomedical entity linker based on residual convolutional neural networks[C]//Findings of the Association for Computational Linguistics (EMNLP 2021), 2021: 1631-1639. [66] ABDURXIT M, TOHTI T, HAMDULLA A. An efficient method for biomedical entity linking based on inter-and intra-entity attention[J]. Applied Sciences, 2022, 12(6): 3191. [67] YUAN H, YUAN Z, YU S. Generative biomedical entity linking via knowledge base-guided pre-training and synonyms-aware fine-tuning[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022: 4038-4048. [68] DE CAO N, AZIZ W, TITOV I. Highly parallel autoregressive entity linking with discriminative correction[J]. arXiv:2109.03792, 2021. [69] MRINI K, NIE S, GU J, et al. Detection, disambiguation, re-ranking: autoregressive entity linking as a multi-task problem[J]. arXiv:2204.05990, 2022. [70] XIE J, JIANG J, WANG Y, et al. Learning an expandable EMR-based medical knowledge network to enhance clinical diagnosis[J]. Artificial Intelligence in Medicine, 2020, 107: 101927. [71] 刘龙航. 基于多资源的中文医疗知识图谱构建方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2020. LIU L H. Research on the construction method of chinese medical knowledge graph based on multi-resources[D]. Harbin: Harbin Institute of Technology, 2020. [72] YUAN Z, ZHAO Z, SUN H, et al. CODER: knowledge-infused cross-lingual medical term embedding for term normalization[J]. Journal of Biomedical Informatics, 2022, 126: 103983. [73] 王莹. 面向互联网医疗百科的知识抽取和融合研究[D]. 广州: 华南理工大学, 2022. WANG Y. Research on knowledge extraction and fusion for Internet Medical Encyclopedia[D]. Guangzhou: South China University of Technology, 2022. [74] ZHANG S, CHENG H, VASHISHTH S, et al. Knowledge-rich self-supervision for biomedical entity linking[C]//Findings of the Association for Computational Linguistics (EMNLP 2022), 2022: 868-880. [75] OBERHAUSER T, BISCHOFF T, BRENDEL K, et al. Trainx-named entity linking with active sampling and bi-encoders[C]//Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations, 2020: 64-69. [76] WIATRAK M, ARVANITI E, BRAYNE A, et al. Proxy-based zero-shot entity linking by effective candidate retrieval[J]. arXiv:2301.13318, 2023. [77] QI Z, ZHANG Z, CHEN J, et al. Unsupervised knowledge graph alignment by probabilistic reasoning and semantic embedding[J]. arXiv:2105.05596, 2021. [78] BANSAL T, VERGA P, CHOUDHARY N, et al. Simultaneously linking entities and extracting relations from biomedical text without mention-level supervision[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 7407-7414. [79] ZHU M, CELIKKAYA B, BHATIA P, et al. Latte: latent type modeling for biomedical entity linking[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 9757-9764. [80] BHOWMIK R, STRATOS K, DE MELO G. Fast and effective biomedical entity linking using a dual encoder[J]. arXiv:2103.05028, 2021. [81] LI C, CAO Y, HOU L, et al. Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019: 2723-2732. [82] CAO Y, LIU Z, LI C, et al. Multi-channel graph neural network for entity alignment[J]. arXiv:1908.09898, 2019. [83] WU Y, LIU X, FENG Y, et al. Neighborhood matching network for entity alignment[J]. arXiv:2005.05607, 2020. [84] SUN Z, WANG C, HU W, et al. Knowledge graph alignment network with gated multi-hop neighborhood aggregation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 222-229. [85] VARMA M, ORR L, WU S, et al. Cross-domain data integration for named entity disambiguation in biomedical text[C]//Findings of the Association for Computational Linguistics (EMNLP 2021), 2021: 4566-4575. [86] UJIIE S, ISO H, YADA S, et al. End-to-end biomedical entity linking with span-based dictionary matching[C]//Proceedings of the 20th Workshop on Biomedical Language Processing, 2021: 162-167. [87] YUAN H, LU K, YUAN Z. Exploring partial knowledge base inference in biomedical entity linking[J]. arXiv:2303. 10330, 2023. [88] VRETINARIS A, LEI C, EFTHYMIOU V, et al. Medical entity disambiguation using graph neural networks[J]. arXiv:2104.01488, 2021. |
[1] | 唐闻涛, 胡泽林. 农业知识图谱研究综述[J]. 计算机工程与应用, 2024, 60(2): 63-76. |
[2] | 邱云飞, 邢浩然, 李刚. 矿井建设知识图谱构建研究综述[J]. 计算机工程与应用, 2023, 59(7): 64-79. |
[3] | 单晓欢, 齐鑫傲, 宋宝燕, 张浩林. 融合多特征图及实体影响力的领域实体消歧[J]. 计算机工程与应用, 2023, 59(5): 305-311. |
[4] | 李凤英, 黎家鹏. 联合三元组嵌入的实体对齐[J]. 计算机工程与应用, 2023, 59(24): 70-77. |
[5] | 辛辉, 谢镇玺, 李朋骏, 王金龙, 熊晓芸. 面向食品贮藏领域的知识图谱构建方法研究[J]. 计算机工程与应用, 2023, 59(22): 329-342. |
[6] | 陈阳, 万卫兵. 多通道特征融合的实体链接模型泛化性能优化[J]. 计算机工程与应用, 2023, 59(16): 125-134. |
[7] | 徐有为, 张宏军, 程恺, 廖湘琳, 张紫萱, 李雷. 知识图谱嵌入研究综述[J]. 计算机工程与应用, 2022, 58(9): 30-50. |
[8] | 汪玉, 王鑫, 张淑娟, 郑国强, 赵龙, 郑高峰. 异构大数据环境中高效率知识融合方法的研究[J]. 计算机工程与应用, 2022, 58(6): 142-148. |
[9] | 熊中敏, 马海宇, 李帅, 张娜. 知识图谱在海洋领域的应用及前景分析综述[J]. 计算机工程与应用, 2022, 58(3): 15-33. |
[10] | 解天扬, 陈明, 席晓桃. 新闻知识图谱中知识融合量化评估研究[J]. 计算机工程与应用, 2022, 58(21): 294-300. |
[11] | 袁俊, 刘国柱, 梁宏涛, 罗清彩. 知识图谱在商业银行风控领域的研究与应用综述[J]. 计算机工程与应用, 2022, 58(19): 37-52. |
[12] | 黄金杰, 赵轩伟, 张昕尧, 马敬评, 史宇奇. 基于领域知识图谱的短文本实体链接[J]. 计算机工程与应用, 2022, 58(1): 165-174. |
[13] | 陈雨婷,刘旭红,刘秀磊. 面向招投标领域的远程监督实体关系抽取研究[J]. 计算机工程与应用, 2020, 56(17): 243-250. |
[14] | 王渊,彭晨辉,王志强,范强,姚一杨,华召云. 知识图谱在电网全业务统一数据中心的应用[J]. 计算机工程与应用, 2019, 55(15): 104-109. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||