MSMVT: Semi-Supervised Framework with Multi-Scale and Multi-View Transformer for Medical Image Segmentation

doi:10.3778/j.issn.1002-8331.2309-0022

Abstract

Abstract: In recent years, despite the Transformer’s remarkable performance across various computer vision tasks, its efficacy in the semi-supervised image segmentation domain remains limited due to the scarcity of high-quality medical image annotations. A semi-supervised medical image segmentation framework with multi-scale and multi-view Transformer is proposed in this paper, which is referred to as MSMVT. Given the promising results of contrastive learning in Transformer pre-training, the paper designs a multi-scale prototype contrastive learning module guided by pseudo-labels. This module employs image pyramid data augmentation techniques to generate semantically rich multi-scale prototype representations for unlabeled images. Through contrastive learning, the consistency between prototypes of different scales is reinforced, effectively mitigating the issues caused by the scarcity of labels in Transformer training. Furthermore, to enhance the stability of Transformer model training, the paper proposes a multi-view consistency learning strategy. This strategy corrects multiple strongly augmented views using weakly-augmented views. By minimizing the output discrepancies between different views, the model maintains multi-level consistency across various augmentations. The experimental results show that the MSMVT framework proposed in this paper outperforms the existing semi-supervised medical image segmentation methods by achieving 88.93%, 84.75%, and 85.38% for image segmentation performance metrics of DSC on three public datasets, namely, ACDC, LIDC, and ISIC, respectively, when only 10% annotation ratio is used.

Key words: semi-supervised medical image segmentation, pseudo-labeling, Transformer, multi-scale, multi-view

摘要： 近年来，Transformer在众多监督式计算机视觉任务中取得了显著进展，然而由于高质量医学标注图像的缺乏，其在半监督图像分割领域的性能仍有待提高。为此，提出了一种基于多尺度和多视图Transformer的半监督医学图像分割框架：MSMVT（multi-scale and multi-view transformer）。鉴于对比学习在Transformer的预训练中取得的良好效果，设计了一个基于伪标签引导的多尺度原型对比学习模块。该模块利用图像金字塔数据增强技术，为无标签图像生成富有语义信息的多尺度原型表示；通过对比学习，强化了不同尺度原型之间的一致性，从而有效缓解了由标签稀缺性导致的Transformer训练不足的问题。此外，为了增强Transformer模型训练的稳定性，提出了多视图一致性学习策略。通过弱扰动视图，以校正多个强扰动视图。通过最小化不同视图之间的输出差异性，使得模型能够对不同扰动保持多层次的一致性。实验结果表明，当仅采用10%的标注比例时，提出的MSMVT框架在ACDC、LIDC和ISIC三个公共数据集上的DSC图像分割性能指标分别达到了88.93%、84.75%和85.38%，优于现有的半监督医学图像分割方法。

关键词: 半监督医学图像分割, 伪标签, Transformer, 多尺度, 多视图

LI Feixiang, JIANG Ailian. MSMVT: Semi-Supervised Framework with Multi-Scale and Multi-View Transformer for Medical Image Segmentation[J]. Computer Engineering and Applications, 2025, 61(2): 273-282.

李飞翔, 降爱莲. MSMVT：多尺度和多视图Transformer半监督医学图像分割框架[J]. 计算机工程与应用, 2025, 61(2): 273-282.

References

[1] RAGHU M, ZHANG C, KLEINBERG J, et al. Transfusion: understanding transfer learning for medical imaging[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019: 3347-3357.
[2] YOU C, ZHAO R, LIU F, et al. Class-aware generative adversarial transformers for medical image segmentation[J]. arXiv:2201.10737, 2022.
[3] ISENSEE F, PETERSEN J, KLEIN A, et al. nnU-Net: self-adapting framework for U-Net-based medical image segmentation[J]. arXiv: 1809. 10486, 2018.
[4] LUO X, CHEN J, SONG T, et al. Semi-supervised medical image segmentation through dual-task consistency[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 8801-8809.
[5] BASAK H, BHATTACHARYA R, HUSSAIN R, et al. An embarrassingly simple consistency regularization method for semi-supervised medical image segmentation[J]. arXiv: 2202.00677, 2022.
[6] TARVAINEN A, VALPOLA H. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results[J]. arXiv:1703.01780, 2017.
[7] 罗港, 吕佳. 基于双任务一致性的半监督深度学习医学图像分割方法[J]. 重庆师范大学学报 (自然科学版), 2022, 39(6): 99-109.
LUO G, LYU J. Semi-supervised medical image segmentation method based on dual task consistency[J]. Journal of Chongqing Normal University (Natural Science), 2022, 39(6): 99-109.
[8] LEE D H. Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks[C]//Proceedings of the Workshop on Challenges in Representation Learning, 2013: 896.
[9] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv: 2010.11929,2020.
[10] SOHN K, BERTHELOT D, CARLINI N, et al. FixMatch: simplifying semi-supervised learning with consistency and confidence[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020: 596-608.
[11] CAO H, WANG Y, CHEN J, et al. Swin-UNet: Unet-like pure transformer for medical image segmentation[C]//Proceedings of the European Conference on Computer Vision, 2022: 205-218.
[12] ZOU Y, ZHANG Z, ZHANG H, et al. PseudoSeg: designing pseudo labels for semantic segmentation[J]. arXiv:2010. 09713, 2020.
[13] 方超伟, 李雪, 李钟毓, 等. 基于双模型交互学习的半监督医学图像分割[J]. 自动化学报, 2023, 49(4): 805-819.
FANG C W, LI X, LI Z Y, et al. Interactive dual-model learning for semi-supervised medical image segmentation[J]. Acta Automatica Sinica, 2023, 49(4): 805-819.
[14] CHEN X, YUAN Y, ZENG G, et al. Semi-supervised semantic segmentation with cross pseudo supervision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 2613-2622.
[15] LUO X, LIAO W, CHEN J, et al. Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency[C]//Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, 2021: 318-329.
[16] WU Y, GE Z, ZHANG D, et al. Mutual consistency learning for semi-supervised medical image segmentation[J]. Medical Image Analysis, 2022, 81: 102530.
[17] YANG L, QI L, FENG L, et al. Revisiting weak-to-strong consistency in semi-supervised semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7236-7246.
[18] BAI Y, CHEN D, LI Q, et al. Bidirectional copy-paste for semi-supervised medical image segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 11514-11524.
[19] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//Proceedings of the International Conference on Machine Learning, 2020: 1597-1607.
[20] CHAITANYA K, ERDIL E, KARANI N, et al. Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation[J]. Medical Image Analysis, 2023: 102792.
[21] WU Y, WU Z, WU Q, et al. Exploring smoothness and class-separation for semi-supervised medical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2022: 34-43.
[22] LUO X, HU M, SONG T, et al. Semi-supervised medical image segmentation via cross teaching between CNN and transformer[C]//Proceedings of the International Conference on Medical Imaging with Deep Learning, 2022: 820-833.
[23] WANG Z, LI T, ZHENG J Q, et al. When CNN meet with vit: towards semi-supervised learning for multi-class medical image semantic segmentation[C]//Proceedings of the European Conference on Computer Vision, 2022: 424-441.
[24] WANG T, LU J, LAI Z, et al. Uncertainty-guided pixel contrastive learning for semi-supervised medical image segmentation[C]//Proceedings of the 31st International Joint Conference on Artificial Intelligence, 2022: 1444-1450.
[25] OORD A, LI Y, VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv:1807.03748,2018.
[26] MILLETARI F, NAVAB N, AHMADI S A. V-Net: fully convolutional neural networks for volumetric medical image segmentation[C]//Proceedings of the 2016 4th International Conference on 3D Vision, 2016: 565-571.
[27] BERNARD O, LALANDE A, ZOTTI C, et al. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved?[J]. IEEE Transactions on Medical Imaging, 2018, 37(11): 2514-2525.
[28] ARMATO S G, MCLENNAN G, BIDAUT L, et al. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans[J]. Medical Physics, 2011, 38(2): 915-931.
[29] CODELLA N C F, GUTMAN D, CELEBI M E, et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC)[C]//Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging, 2018: 168-172.
[30] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015: 234-241.
[31] YU L, WANG S, LI X, et al. Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation[C]//Proceedings of the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention, 2019: 605-613.