Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (2): 273-282.DOI: 10.3778/j.issn.1002-8331.2309-0022

• Graphics and Image Processing • Previous Articles     Next Articles

MSMVT: Semi-Supervised Framework with Multi-Scale and Multi-View Transformer for Medical Image Segmentation

LI Feixiang, JIANG Ailian   

  1. College of Computer Science and Technology & College of Data Science, Taiyuan University of Technology, Jinzhong, Shanxi 030600, China
  • Online:2025-01-15 Published:2025-01-15

MSMVT:多尺度和多视图Transformer半监督医学图像分割框架

李飞翔,降爱莲   

  1. 太原理工大学 计算机科学与技术学院(大数据学院),山西 晋中 030600

Abstract: In recent years, despite the Transformer’s remarkable performance across various computer vision tasks, its efficacy in the semi-supervised image segmentation domain remains limited due to the scarcity of high-quality medical image annotations. A semi-supervised medical image segmentation framework with multi-scale and multi-view Transformer is proposed in this paper, which is referred to as MSMVT. Given the promising results of contrastive learning in Transformer pre-training, the paper designs a multi-scale prototype contrastive learning module guided by pseudo-labels. This module employs image pyramid data augmentation techniques to generate semantically rich multi-scale prototype representations for unlabeled images. Through contrastive learning, the consistency between prototypes of different scales is reinforced, effectively mitigating the issues caused by the scarcity of labels in Transformer training. Furthermore, to enhance the stability of Transformer model training, the paper proposes a multi-view consistency learning strategy. This strategy corrects multiple strongly augmented views using weakly-augmented views. By minimizing the output discrepancies between different views, the model maintains multi-level consistency across various augmentations. The experimental results show that the MSMVT framework proposed in this paper outperforms the existing semi-supervised medical image segmentation methods by achieving 88.93%, 84.75%, and 85.38% for image segmentation performance metrics of DSC on three public datasets, namely, ACDC, LIDC, and ISIC, respectively, when only 10% annotation ratio is used.

Key words: semi-supervised medical image segmentation, pseudo-labeling, Transformer, multi-scale, multi-view

摘要: 近年来,Transformer在众多监督式计算机视觉任务中取得了显著进展,然而由于高质量医学标注图像的缺乏,其在半监督图像分割领域的性能仍有待提高。为此,提出了一种基于多尺度和多视图Transformer的半监督医学图像分割框架:MSMVT(multi-scale and multi-view transformer)。鉴于对比学习在Transformer的预训练中取得的良好效果,设计了一个基于伪标签引导的多尺度原型对比学习模块。该模块利用图像金字塔数据增强技术,为无标签图像生成富有语义信息的多尺度原型表示;通过对比学习,强化了不同尺度原型之间的一致性,从而有效缓解了由标签稀缺性导致的Transformer训练不足的问题。此外,为了增强Transformer模型训练的稳定性,提出了多视图一致性学习策略。通过弱扰动视图,以校正多个强扰动视图。通过最小化不同视图之间的输出差异性,使得模型能够对不同扰动保持多层次的一致性。实验结果表明,当仅采用10%的标注比例时,提出的MSMVT框架在ACDC、LIDC和ISIC三个公共数据集上的DSC图像分割性能指标分别达到了88.93%、84.75%和85.38%,优于现有的半监督医学图像分割方法。

关键词: 半监督医学图像分割, 伪标签, Transformer, 多尺度, 多视图