计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (6): 259-273.DOI: 10.3778/j.issn.1002-8331.2211-0099

• 图形图像处理 • 上一篇    下一篇

小波分频自注意力Transformer图像去雨网络

方思严,刘斌   

  1. 湖北大学 计算机与信息工程学院,武汉 430062
  • 出版日期:2024-03-15 发布日期:2024-03-15

Wavelet Frequency Division Self-Attention Transformer Image Deraining Network

FANG Siyan, LIU Bin   

  1. School of Computer Science and Information Engineering, Hubei University, Wuhan 430062, China
  • Online:2024-03-15 Published:2024-03-15

摘要: 针对视觉Transformer对高频信息捕捉能力弱以及目前许多图像去雨方法易丢失细节的问题,提出小波分频自注意力Transformer图像去雨网络(WFDST-Net)。小波分频自注意力Transformer(WFDST)作为WFDST-Net的主要模块,其利用不可分提升小波变换获取特征图的低频分量和高频分量,分别在低频和高频中进行自注意力交互,使模块从低频中学习恢复全局结构的能力,在高频中强化捕捉雨纹等线条细节的能力,增强对不同频域特征的建模能力。WFDST-Net采用U形架构并通过不可分提升小波变换获取多尺度特征,可在捕获不同形状高频雨纹的同时保证信息的完整性。相比其他图像去雨相关的Transformer,WFDST-Net具有更低的参数量。此外,提出VOCRain250数据集用于联合图像去雨和语义分割任务,该数据集比目前广泛使用的BDD150更具优势。实验表明,所提方法增强了视觉Transformer对不同频域信息的捕获能力,并在合成和真实数据集以及VOCRain250中的表现优于目前先进的去雨方法,能有效去除复杂雨纹并保留更多细节特征。

关键词: 图像去雨, Transformer, 自注意力, 不可分提升小波, 频域

Abstract: In view of the weak ability of vision Transformer (ViT)  to capture high-frequency information and the problem that many image deraining methods are prone to lose details, a wavelet frequency division self-attention Transformer image deraining network (WFDST-Net)  is proposed. As the main module of WFDST-Net, the wavelet frequency division self-attention Transformer (WFDST)  uses non-separable lifting wavelet transform to obtain the low-frequency and high-frequency components of feature map, and carries out self-attention interaction in the low frequency and high frequency respectively, so that the module can learn from the low frequency to restore the overall structure, and strengthen the ability to capture line details such as rain streaks in the high frequency, thus enhancing the modeling ability of different frequency domain features. WFDST-Net adopts U-shaped architecture and obtains multi-scale features through non-separable lifting wavelet transform, which can capture high-frequency rain streaks of different shapes while ensuring the integrity of information. WFDST-Net has lower parameters than other Transformers related to image deraining. In addition, the VOCRain250 dataset is proposed for the task of joint image deraining and semantic segmentation, which has advantages over the currently widely used BDD150. The experimental results show that the proposed method enhances the ability of ViT to capture different frequency domain information, and outperforms the current state-of-the-art deraining methods in the performance of synthetic and real-world datasets and joint semantic segmentation tasks. It can effectively remove complex rain streaks while retaining more background details.

Key words: image deraining, Transformer, self-attention, non-separable lifting wavelet, frequency domain