计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (20): 281-294.DOI: 10.3778/j.issn.1002-8331.2407-0148

• 图形图像处理 • 上一篇    下一篇

用于遥感图像变化检测的多尺度双重交叉注意Transformer网络

邓文浩, 段中兴   

  1. 1.西安建筑科技大学 信息与控制工程学院,西安 710055
    2.西安建筑科技大学 西安市建筑制造智能化技术重点实验室,西安 710055
  • 出版日期:2025-10-15 发布日期:2025-10-15

Multi-Scale Dual Cross Attention Transformer Network for Change Detection in Remote Sensing Images

DENG Wenhao, DUAN Zhongxing   

  1. 1.College of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
    2.Xi’an Key Laboratory of Building Manufacturing Intelligent & Automation Technology, Xi’an University of Architecture and Technology, Xi’an 710055, China
  • Online:2025-10-15 Published:2025-10-15

摘要: 针对现有基于深度学习的方法偏重高级变化语义特征提取而难以捕捉地物细节变化,导致检测变化边界模糊且易受伪变化干扰,以及传统U型架构中跳跃连接难以缩小编码器和解码器之间语义差距的问题,提出了一种多尺度双重交叉注意Transformer网络(multi-scale dual cross attention Transformer network,MDCATNet)用于遥感图像变化检测。在编码器中,MDCATNet利用主要特征保留策略和具有残差结构的卷积块构建共享权重的孪生神经网络提取双时相图像的多尺度特征。在解码器中,为了缩小编码器与解码器之间的语义鸿沟,充分融合多尺度特征的远程通道和空间信息,提出了一个新颖的多尺度多头通道-空间交叉融合Transformer模块,用于替代传统跳跃连接。为了进一步细化特征,获得更多变化区域细节信息和平滑的边界轮廓,提出了通道交叉注意细化模块,用于从下至上逐层细化特征并生成高质量的预测图。在LEVIR-CD和SYSU-CD数据集上的实验表明,与其他六种对比算法相比,MDCATNet无论是在定量评价还是在可视化结果方面均取得了最优的检测成绩,具有更强的泛化能力。

关键词: 遥感图像, 变化检测, 语义差距, 跳跃连接, Transformer, 交叉注意力

Abstract: Existing deep learning-based methods tend to focus on extracting advanced change semantic features, making it challenging to capture changes in ground object details, resulting in fuzzy boundaries and vulnerability to pseudo change. Meanwhile, the skip connection in the traditional U-shaped architecture is difficult for narrowing the semantic gap between the encoder and decoder. To solve the above problems, a multi-scale dual cross attention transformer network (MDCATNet) is proposed for remote sensing image change detection. In the encoder part, MDCATNet utilizes a primary feature conservation strategy and convolutional blocks with residual structures to construct the Siamese network with shared weights to extract multiscale features of the dual-temporal image. In the decoder part, in order to narrow the semantic gap between the encoder and the decoder, and to fully integrate the remote channel and spatial information of the multi-scale features, a novel multi-scale multi-head channel-spatial cross fusion Transformer module is proposed as an alternative to the traditional skip connection. In order to further refine the features and obtain more detailed change regions and smoother boundary contours, a channel cross attention refinement module is proposed for refining the features layer by layer from bottom to top and generating high-quality prediction maps. Experiments on LEVIR-CD and SYSU-CD datasets show that compared with the other six algorithms, MDCATNet achieves the best detection results in both quantitative evaluation and visualization, and has stronger generalization ability.

Key words: remote sensing image, change detection, semantic gap, skip connection, Transformer, cross attention