计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (22): 257-266.DOI: 10.3778/j.issn.1002-8331.2408-0166

• 图形图像处理 • 上一篇    下一篇

动态上下文感知及残差注意力食管癌病变分割

丁楠,李小霞,曹耀丹,毛艳会,何琴,姜坤元,程杰,周颖玥   

  1. 1.西南科技大学 信息工程学院,四川 绵阳 621010 
    2.四川省工业自主可控人工智能工程技术研究中心,四川 绵阳 621010
    3.四川绵阳四〇四医院,四川 绵阳 621000
  • 出版日期:2025-11-15 发布日期:2025-11-14

Dynamic Context-Aware and Residual Attention Esophageal Cancer Lesion Segmentation

DING Nan, LI Xiaoxia, CAO Yaodan, MAO Yanhui, HE Qin, JIANG Kunyuan, CHENG Jie, ZHOU Yingyue   

  1. 1.School of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
    2.Sichuan Industrial Autonomous and Controllable Artificial Intelligence Engineering Technology Research Center, Mianyang, Sichuan 621010, China
    3.Sichuan Mianyang 404 Hospital, Mianyang, Sichuan 621000, China
  • Online:2025-11-15 Published:2025-11-14

摘要: 针对食管早癌及癌前病变区域细粒度分割中出现的类间差异小、类内差异大和边缘模糊等问题,提出一种动态上下文感知残差注意力网络。使用金字塔视觉Transformer v2(pyramid vision transformer v2,PVTv2)作为特征提取网络设计PVT分支,提取主要的特征表示,再通过残差块的堆叠设计残差注意力全卷积分支,对细节特征进行补充。在全卷积分支的编码器上设计动态残差注意力特征增强模块,在强化重要特征表示的同时,保留编码器每个阶段的初始图像信息。在PVT分支的解码阶段设计动态上下文特征感知引导模块,利用多尺度特征引导局部与全局信息的动态融合,实现自适应的渐进解码过程,在保留细节信息的同时,加深对全局上下文的理解。分别在自建食管癌数据集,公开数据集Kvasir-SEG、CVC-ClinicDB和ISIC2018上进行验证,实验结果表明,相似系数分别达到了74.92%、95.79%、96.83%和92.89%,优于主流分割网络。

关键词: 医学图像分割, 动态注意力, 跨尺度特征融合, 上下文感知

Abstract: A dynamic context-aware residual attention network is proposed to address small inter-class differences, large intra-class differences, and blurred edges in fine-grained segmentation of early esophageal cancer and precancerous lesions. The pyramid vision transformer v2 (PVTv2) is employed as the feature extraction network, forming the PVT branch to capture primary feature representations. A residual attention full convolution branch is designed by stacking residual blocks to enhance detailed features. A dynamic residual attention feature enhancement module is integrated into the encoder of this branch to reinforce key feature representations while preserving initial image information. During the decoding phase of the PVT branch, a dynamic context feature guidance module is designed to fuse local and global information using multi-scale features, enabling an adaptive progressive decoding process that retains details and enhances global context understanding. The validation on a self-built esophageal cancer dataset and public datasets Kvasir-SEG, CVC-ClinicDB, and ISIC2018 demonstrates Dice coefficients of 74.92%, 95.79%, 96.83% and 92.89%, respectively, outperforming mainstream segmentation networks.

Key words: medical image segmentation, dynamic attention, cross-scale feature fusion, context awareness