Review of Application of Visual Foundation Model SAM in Medical Image Segmentation

doi:10.3778/j.issn.1002-8331.2401-0136

Abstract

Abstract: With the continuous development of foundation models technology, visual foundation model represented by the segment anything model (SAM) has made significant breakthroughs in the field of image segmentation. SAM, driven by prompts, accomplishes a series of downstream segmentation tasks, aiming to address all image segmentation issues comprehensively. Therefore, the application of SAM in medical image segmentation is of great significance, as its generalization performance can adapt to various medical images, providing healthcare professionals with a more comprehensive understanding of anatomical structures and pathological information. This paper introduces commonly used datasets for image segmentation, provides detailed explanations of SAM’s network architecture and generalization capabilities. It focuses on a thorough analysis of SAM’s application in five major categories of medical images: whole-slide imaging, magnetic resonance imaging, computed tomography, ultrasound, and multimodal images. The review summarizes the strengths and weaknesses of SAM, along with corresponding improvement methods. Combining current challenges in the field of medical image segmentation, the paper discusses and anticipates future directions for SAM’s development.

Key words: visual foundation model, segment anything model (SAM), medical images, image segmentation

摘要： 随着大模型技术的不断发展，以分割一切模型（segment anything model，SAM）为代表的视觉大模型在图像分割领域取得重要突破。SAM通过提示驱动完成一系列下游分割任务，旨在统一解决所有的图像分割问题。因此，将SAM应用于医学图像分割具有重要意义，其泛化性能够适应多种医学图像，为医生提供更全面的解剖结构和病变信息。介绍了图像分割常用的数据集；对SAM的网络结构和泛化性进行细致阐述；重点对SAM应用在全切片成像、磁共振成像、计算机断层扫描、超声和多模态图像的五大类医学图像进行梳理分析，总结优缺点和相应的改进方法；结合当前医学图像分割领域中存在的实际问题，讨论并展望了SAM未来的发展方向。

关键词: 视觉大模型, 分割一切模型（SAM）, 医学图像, 图像分割

SUN Xing, CAI Xiaohong, LI Ming, ZHANG Shuai, MA Jingang. Review of Application of Visual Foundation Model SAM in Medical Image Segmentation[J]. Computer Engineering and Applications, 2024, 60(17): 1-16.

孙兴, 蔡肖红, 李明, 张帅, 马金刚. 视觉大模型SAM在医学图像分割中的应用综述[J]. 计算机工程与应用, 2024, 60(17): 1-16.

References

[1] RITTER F, BOSKAMP T, HOMEYER A, et al. Medical image analysis[J]. IEEE Pulse, 2011, 2(6): 60-70.
[2] BUSHBERG J T, BOONE J M. The essential physics of medical imaging[M]. Lippincott Williams & Wilkins, 2011.
[3] 梁芳烜, 杨锋, 卢丽云, 等. 基于卷积神经网络的脑肿瘤分割方法综述[J]. 计算机工程与应用, 2021, 57(7): 34-43.
LIANG F X, YANG F, LU L Y, et al. Review of brain tumor segmentation methods based on convolutional neural networks[J]. Computer Engineering and Applications, 2021, 57(7): 34-43.
[4] SHERSTINSKY A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network[J]. Physica D: Nonlinear Phenomena, 2020, 404: 132306.
[5] 钟思华, 郭兴明, 郑伊能. 改进U-Net网络的肺结节分割方法[J]. 计算机工程与应用, 2020, 56(17): 203-209.
ZHONG S H, GUO X M, ZHENG Y N. Improved U-Net network for lung nodule segmentation[J]. Computer Engineering and Applications, 2020, 56(17): 203-209.
[6] YUAN Y. On the power of foundation models[C]//Proceedings of the International Conference on Machine Learning, 2023: 40519-40530.
[7] LüDDECKE T, ECKER A. Image segmentation using text and image prompts[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 7086-7096.
[8] SHAH D, SRIDHAR A, DASHORA N, et al. ViNT: a foundation model for visual navigation[C]//Proceedings of the Conference on Robot Learning, 2023: 711-733.
[9] WANG W, DAI J, CHEN Z, et al. Internimage: exploring large-scale vision foundation models with deformable convolutions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14408-14419.
[10] LU P, BANSAL H, XIA T, et al. MathVista: evaluating mathematical reasoning of foundation models in visual contexts[C]//Proceedings of the 12th International Conference on Learning Representations, 2023.
[11] KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 4015-4026.
[12] THIRUNAVUKARASU A J, TING D S J, ELANGOVAN K, et al. Large language models in medicine[J]. Nature Medicine, 2023, 29(8): 1930-1940.
[13] RAJPURKAR P, CHEN E, BANERJEE O, et al. AI in health and medicine[J]. Nature Medicine, 2022, 28(1): 31-38.
[14] HAMET P, TREMBLAY J. Artificial intelligence in medicine[J]. Metabolism, 2017, 69: S36-S40.
[15] ZENG W, REN X, SU T, et al. Pangu-α: large-scale autoregressive pretrained chinese language models with auto-parallel computation[J]. arXiv:2104.12369, 2021.
[16] ZHANG L, DENG X, LU Y. Segment anything model (SAM) for medical image segmentation: a preliminary review[C]//Proceedings of the 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023: 4187-4194.
[17] ZHANG Y, JIAO R. How segment anything model (SAM) boost medical image segmentation: a survey[J]. arXiv:2305.
03678, 2023.
[18] KUZNETSOVA A, ROM H, ALLDRIN N, et al. The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale[J]. International Journal of Computer Vision, 2020, 128(7): 1956-1981.
[19] YE J, CHENG J, CHEN J, et al. SA-Med2D-20M dataset: segment anything in 2D medical imaging with 20 million masks[J]. arXiv:2311.11969, 2023.
[20] HUANG Y, YANG X, LIU L, et al. Segment anything model for medical images?[J]. Medical Image Analysis, 2024, 92: 103061.
[21] ADEWOLE M, RUDIE J D, GBDAMOSI A, et al. The brain tumor segmentation (BraTS) challenge 2023: glioma segmentation in Sub-Saharan Africa patient population (BraTS-Africa)[J]. arXiv:2305.19369, 2023.
[22] LECLERC S, SMISTAD E, PEDROSA J, et al. Deep learning for segmentation using an open large-scale dataset in 2D echocardiography[J]. ?IEEE Transactions on Medical Imaging, 2019, 38(9): 2198-2210. ?
[23] YANG J, SHI R, WEI D, et al. MedMNIST v2: a large-scale lightweight benchmark for 2D and 3D biomedical image classification[J]. Scientific Data, 2023, 10(1): 41.
[24] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[25] HE K, CHEN X, XIE S, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 16000-16009.
[26] WANG J, CHAN K C K, LOY C C. Exploring clip for assessing the look and feel of images[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2023: 2555-2563.
[27] 王波, 李梦翔, 刘侠. 基于改进U-Net网络的甲状腺结节超声图像分割方法[J]. 电子与信息学报, 2022, 44(2): 514-522.
WANG B, LI M X, LIU X. Ultrasound image segmentation method of thyroid nodules based on the improved U-Net network[J]. Journal of Electronics and Information Technology, 2022, 44(2): 514-522.
[28] HUANG Y, CAO Y, LI T, et al. On the robustness of segment anything[J]. arXiv:2305.16220, 2023.
[29] MUSA A, VISHI K, REXHA B. Attack analysis of face recognition authentication systems using fast gradient sign method[J]. Applied Artificial Intelligence, 2021, 35(15): 1346-1360.
[30] GUPTA H, JIN K H, NGUYEN H Q, et al. CNN-based projected gradient descent for consistent CT image reconstruction[J]. IEEE Transactions on Medical Imaging, 2018, 37(6): 1440-1453.
[31] QIAO Y, ZHANG C, KANG T, et al. Robustness of sam: segment anything under corruptions and beyond[J]. arXiv:2306.07713, 2023.
[32] HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 1501-1510.
[33] CHEN T, MAI Z, LI R, et al. Segment anything model (SAM) enhanced pseudo labels for weakly supervised semantic segmentation[J]. arXiv:2305.05803, 2023.
[34] JIANG P T, ZHANG C B, HOU Q, et al. LayerCAM: exploring hierarchical class activation maps for localization[J]. IEEE Transactions on Image Processing, 2021, 30: 5875-5888.
[35] FARAHANI N, PARWANI A V, PANTANOWITZ L. Whole slide imaging in pathology: advantages, limitations, and emerging perspectives[J]. Pathology and Laboratory Medicine International, 2015, 7: 23-33.
[36] LI X, DENG R, TANG Y, et al. Leverage weekly annotation to pixel-wise annotation via zero-shot segment anything model for molecular-empowered learning[C]//Proceedings of the Medical Imaging 2024: Digital and Computational Pathology, 2024: 133-139.
[37] DENG R, CUI C, LIU Q, et al. Segment anything model (SAM) for digital pathology: assess zero-shot segmentation on whole slide imaging[J]. arXiv:2304.04155, 2023.
[38] HIEBER D, KAETHAN M, HOLL F, et al. Evaluating the segment anything model for histopathological tissue segmentation[C]//Proceedings of the German Medical Science, 2023.
[39] H?RST F, REMPE M, HEINE L, et al. CellViT: vision transformers for precise cell segmentation and classification[J]. Medical Image Analysis, 2024, 94: 103143.
[40] GRAHAM S, VU Q D, RAZA S E A, et al. HoVer-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images[J]. Medical Image Analysis, 2019, 58: 101563.
[41] SHAHARABANY T, DAHAN A, GIRYES R, et al. AutoSAM: adapting sam to medical images by overloading the prompt encoder[J]. arXiv:2306.06370, 2023.
[42] ZHANG J, MA K, KAPSE S, et al. SAM-Path: a segment anything model for semantic segmentation in digital pathology[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2023: 161-170.
[43] KATTI G, ARA S A, SHIREEN A. Magnetic resonance imaging (MRI)—a review[J]. International Journal of Dental Clinics, 2011, 3(1): 65-70.
[44] PUTZ F, GRIGO J, WEISSMANN T, et al. The segment anything foundation model achieves favorable brain tumor autosegmentation accuracy on MRI to support radiotherapy treatment planning[J]. arXiv:2304.07875, 2023.
[45] ZHANG P, WANG Y. Segment anything model for brain tumor segmentation[J]. arXiv:2309.08434, 2023.
[46] PSYCHOGYIOS K, LELIGOU H C, MELISSARI F, et al. SAMStyler: enhancing visual creativity with neural style transfer and segment anything model (SAM)[J]. IEEE Access, 2023, 11: 100256-100267.
[47] LI Y, WANG D, YUAN C, et al. Enhancing agricultural image segmentation with an agricultural segment anything model adapter[J]. Sensors, 2023, 23(18): 7884.
[48] ZHANG W, WANG Y, SHEN G, et al. Tobacco leaf segmentation based on improved MASK RCNN algorithm and SAM model[J]. IEEE Access, 2023, 11: 103102-103114.
[49] LI Y, JING B, FENG X, et al. nnSAM: plug-and-play segment anything model improves nnunet performance[J]. arXiv:2309.16967, 2023.
[50] ISENSEE F, JAEGER P F, KOHL S A A, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation[J]. Nature Methods, 2021, 18(2): 203-211.
[51] LI N, XIONG L, QIU W, et al. Segment anything model for semi-supervised medical image segmentation via selecting reliable pseudo-labels[C]//Proceedings of the International Conference on Neural Information Processing, 2023: 138-149.
[52] SMITH S M. Fast robust automated brain extraction[J]. Human Brain Mapping, 2002, 17(3): 143-155.
[53] MOHAPATRA S, GOSAI A, SCHLAUG G. SAM vs BET: a comparative study for brain extraction and segmentation of magnetic resonance images using deep learning[J]. arXiv:2304.04738, 2023.
[54] BUZUG T M. Computed tomography[M]//Springer handbook of medical technology. Berlin, Heidelberg: Springer, 2011: 311-342.
[55] ZHANG K, LIU D. Customized segment anything model for medical image segmentation[J]. arXiv:2304.13785, 2023.
[56] LIN A, CHEN B, XU J, et al. DS-TransUNet: dual swin transformer U-Net for medical image segmentation[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-15.
[57] HU E J, WALLIS P, ALLEN-ZHU Z, et al. LoRA: low-rank adaptation of large language models[C]//Proceedings of the International Conference on Learning Representations, 2022.
[58] WANG L, MA C, FENG X, et al. A survey on large language model based autonomous agents[J]. Frontiers of Computer Science, 2024, 18(6): 1-26.
[59] HUANG X, DENG Z, LI D, et al. MISSFormer: an effective transformer for 2D medical image segmentation[J]. IEEE Transactions on Medical Imaging, 2023, 42(5): 1484-1494.
[60] FENG W, ZHU L, YU L. Cheap lunch for medical image segmentation by fine-tuning SAM on few exemplars[J]. arXiv:2308.14133, 2023.
[61] YUE W, ZHANG J, HU K, et al. Surgicalsam: efficient class promptable surgical instrument segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 6890-6898.
[62] NGUYEN H H, NGUYEN C T, TRAN M T. Volumetric CT segmentation with mask propagation using segment anything[C]//Proceedings of the 12th International Symposium on Information and Communication Technology, 2023: 623-630.
[63] KONG L, HUANG M, ZHANG L, et al. Enhancing diagnostic images to improve the performance of the segment anything model in medical image segmentation[J]. Bioengineering, 2024, 11(3): 270.
[64] SINGH P, SHANKAR A. A novel optical image denoising technique using convolutional neural network and anisotropic diffusion for real-time surveillance applications[J]. Journal of Real-Time Image Processing, 2021, 18(5): 1711-1728.
[65] ZHANG Y, SHEN Z, JIAO R. Segment anything model for medical image segmentation: current applications and future directions[J]. Computers in Biology and Medicine, 2024, 171: 108238.
[66] LAUGIER P, HA?AT G. Introduction to the physics of ultrasound[J]. Bone Quantitative Ultrasound, 2011: 29-45.
[67] CHEN F, CHEN L, HAN H, et al. The ability of segmenting anything model (SAM) to segment ultrasound images[J]. BioScience Trends, 2023.
[68] NING G, LIANG H, JIANG Z, et al. The potential of 'segment anything' (SAM) for universal intelligent ultrasound image guidance[J]. BioScience Trends, 2023.
[69] MATTJIE C, DE MOURA L V, RAVAZIO R, et al. Zero-shot performance of the segment anything model (SAM) in 2D medical imaging: a comprehensive evaluation and practical guidelines[C]//Proceedings of the IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE), 2023: 108-112.
[70] JIANG X, MA J, XIAO G, et al. A review of multimodal image matching: methods and applications[J]. Information Fusion, 2021, 73: 22-71.
[71] MAZUROWSKI M A, DONG H, GU H, et al. Segment anything model for medical image analysis: an experimental study[J]. Medical Image Analysis, 2023, 89: 102918.
[72] SHI P, QIU J, ABAXI S M D, et al. Generalist vision foundation models for medical imaging: a case study of segment anything model on zero-shot medical segmentation[J]. Diagnostics, 2023, 13(11): 1947.
[73] LLUGSI R, EL YACOUBI S, FONTAINE A, et al. Comparison between Adam, AdaMax and Adam W optimizers to implement a weather forecast based on neural networks for the Andean city of Quito[C]//Proceedings of the IEEE 5th Ecuador Technical Chapters Meeting (ETCM), 2021: 1-6.
[74] HAN K, WANG Y, CHEN H, et al. A survey on vision transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 87-110.
[75] CHANG Y, WANG X, WANG J, et al. A survey on evaluation of large language models[J]. ACM Transactions on Intelligent Systems and Technology, 2023, 15(3): 1-45.
[76] CHENG J, YE J, DENG Z, et al. SAM-Med2D[J]. arXiv:2308.16184, 2023.
[77] CHEN S, GE C, TONG Z, et al. Adaptformer: adapting vision transformers for scalable visual recognition[C]//Advances in Neural Information Processing Systems, 2022: 16664-16678.
[78] SUN J, CHEN K, HE Z, et al. Medical image analysis using improved SAM-Med2D: segmentation and classification perspectives[J]. BMC Medical Imaging, 2024.
[79] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[80] MA J, HE Y, LI F, et al. Segment anything in medical images[J]. Nature Communications, 2024, 15(1): 654.
[81] AZAD R, ASADI-AGHBOLAGHI M, FATHY M, et al. Attention DeepLabV3+: multi-level context attention mechanism for skin lesion segmentation[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 251-266.
[82] LI K, RAJPURKAR P. Adapting segment anything models to medical imaging via fine-tuning without domain pretraining[C]//Proceedings of the AAAI 2024 Spring Symposium on Clinical Foundation Models, 2024.
[83] LIU Z, MAO H, WU C Y, et al. A ConvNet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 11976-11986.
[84] QUAN Q, TANG F, XU Z, et al. Slide-SAM: medical SAM meets sliding window[C]//Proceedings of the Medical Imaging with Deep Learning, 2024.
[85] ZHANG S, METAXAS D. On the challenges and perspectives of foundation models for medical image analysis[J]. Medical Image Analysis, 2023, 91: 102996.
[86] WANG H, GUO S, YE J, et al. SAM-Med3D[J]. arXiv:2310.15161, 2023.
[87] LIU J, WANG Y, JU C, et al. Annotation-free audio-visual segmentation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024: 5604-5614.
[88] XIONG X, WANG C, LI W, et al. Mammo-SAM: adapting foundation segment anything model for automatic breast mass segmentation in whole mammograms[C]//Proceedings of the International Workshop on Machine Learning in Medical Imaging, 2023: 176-185.
[89] FAZEKAS B, MORANO J, LACHINOV D, et al. Adapting segment anything model (SAM) for retinal OCT[C]//Proceedings of the International Workshop on Ophthalmic Medical Image Analysis, 2023: 92-101.
[90] RAMESH D B, IYTHA SRIDHAR R, UPADHYAYA P, et al. Lung grounded-SAM (LuGSAM): a novel framework for integrating text prompts to segment anything model (SAM) for segmentation tasks of ICU chest X-rays[J]. Authorea Preprints, 2023.
[91] GARG A, MAGO V. Role of machine learning in medical research: a survey[J]. Computer Science Review, 2021, 40: 100370.
[92] SHEN D, WU G, SUK H I. Deep learning in medical image analysis[J]. Annual Review of Biomedical Engineering, 2017, 19: 221-248.
[93] LITJENS G, KOOI T, BEJNORDI B E, et al. A survey on deep learning in medical image analysis[J]. Medical Image Analysis, 2017, 42: 60-88.
[94] MOOR M, BANERJEE O, ABAD Z S H, et al. Foundation models for generalist medical artificial intelligence[J]. Nature, 2023, 616: 259-265.
[95] WANG D, WANG X, WANG L, et al. A real-world dataset and benchmark for foundation model adaptation in medical image classification[J]. Scientific Data, 2023, 10(1): 574.
[96] ZHOU Y, CHIA M A, WAGNER S K, et al. A foundation model for generalizable disease detection from retinal images[J]. Nature, 2023, 622: 156-163.
[97] CHENG Y, WANG D, ZHOU P, et al. Model compression and acceleration for deep neural networks: the principles, progress, and challenges[J]. IEEE Signal Processing Magazine, 2018, 35(1): 126-136.
[98] BECKMANN D, KOCKWELP J, GROMOLL J, et al. SAM meets gaze: passive eye tracking for prompt-based instance segmentation[C]//Proceedings of the NeuRIPS 2023 Workshop on Gaze Meets ML, 2023: 21-39.
[99] MISHRA S, STURM B L, DIXON S. Local interpretable model-agnostic explanations for music content analysis[J]. Proceedings of the ISMIR, 2017, 53: 537-543.
[100] HUANG Z, BIANCHI F, YUKSEKGONUL M, et al. A visual-language foundation model for pathology image analysis using medical Twitter[J]. Nature Medicine, 2023, 29(9): 2307-2316.