计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (15): 298-309.DOI: 10.3778/j.issn.1002-8331.2405-0087

• 图形图像处理 • 上一篇    下一篇

掩码重建融合对比学习的自监督医学图像分割

肖慈美,降爱莲,冀伟,高峰   

  1. 太原理工大学 计算机科学与技术学院(大数据学院),山西 晋中 030600
  • 出版日期:2025-08-01 发布日期:2025-07-31

Mask Reconstruction Fused with Contrastive Learning for Self-Supervised Medical Image Segmentation

XIAO Cimei, JIANG Ailian, JI Wei, GAO Feng   

  1. College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Jinzhong, Shanxi 030600, China
  • Online:2025-08-01 Published:2025-07-31

摘要: 在医学图像分析领域,由于难以获取大规模标注数据,自监督方法应用日益广泛。其中,掩码自编码器在准确捕获感兴趣的区域,并有效控制掩码区域的生成和有效利用方面较为困难。提出了一种掩码重建融合对比学习的自监督医学图像分割方法(mask reconstruction fused with contrastive learning,MRCL),该方法包含对比重建任务、混合卷积特征融合模块和多尺度编码器架构。对比重建任务将对比学习应用到掩码自编码器中,通过学习图像中不同区域的相似性和差异性提高了特征的区分度,并利用对比损失来优化两个随机掩码视图的表示,增强了掩码自编码器对感兴趣区域的捕获能力。同时由于对比学习依赖于强大的数据增强,还可以进一步提高模型的泛化性能。此外,混合卷积特征融合模块通过互补性设计将注意力层和卷积层进行融合,使模型能够有效地提取局部和全局特征;而多尺度编码器架构则将不同尺度的特征图进行融合,提高了模型对多尺度信息的表征能力。实验结果表明,在仅使用20%标注数据的情况下,提出的方法在ACDC、LIDC和ISIC三个公共数据集上的DSC图像分割性能指标分别达到了86.61%、80.19%和87.55%,优于现有的自监督医学图像分割方法。

关键词: 自监督, 医学图像分割, 多尺度, 对比学习, 掩码自编码器

Abstract: In the field of medical image analysis, due to the difficulty of obtaining large-scale labeled data, self-supervised methods are increasingly applied. Among them, masked autoencoders face challenges in accurately capturing regions of interest and effectively generating and utilizing mask regions. This paper proposes a self-supervised medical image segmentation method with mask reconstruction fused with contrastive learning (MRCL). This method includes a contrastive reconstruction task, a hybrid convolutional feature fusion module, and a multi-scale encoder architecture. The contrastive reconstruction task applies contrastive learning to mask autoencoders, enhancing feature discrimination by learning the similarities and differences between different regions in the image. It optimizes the representations of two random mask views using contrastive loss, thus the mask autoencoder has strengthened it’s ability to capture regions of interest. Additionally, since contrastive learning relies on strong data augmentation, it further improves the generalization performance of the model. Moreover, the hybrid convolutional feature fusion module, through complementary design, integrates attention layers and convolutional layers, enabling the model to effectively extract both local and global features. The multi-scale encoder architecture integrates feature maps of different scales, improving the representation ability of the model for multi-scale information. Experimental results show that with only 20% labeled data, the proposed method achieves DSC image segmentation performance metrics of 86.61%, 80.19%, and 87.55% on the ACDC, LIDC, and ISIC public datasets, respectively, outperforming existing self-supervised medical image segmentation methods.

Key words: self-supervision, medical image segmentation, multi-scale, contrastive learning, masked autoencoder