小波分频自注意力Transformer图像去雨网络

doi:10.3778/j.issn.1002-8331.2211-0099

摘要/Abstract

摘要： 针对视觉Transformer对高频信息捕捉能力弱以及目前许多图像去雨方法易丢失细节的问题，提出小波分频自注意力Transformer图像去雨网络（WFDST-Net）。小波分频自注意力Transformer（WFDST）作为WFDST-Net的主要模块，其利用不可分提升小波变换获取特征图的低频分量和高频分量，分别在低频和高频中进行自注意力交互，使模块从低频中学习恢复全局结构的能力，在高频中强化捕捉雨纹等线条细节的能力，增强对不同频域特征的建模能力。WFDST-Net采用U形架构并通过不可分提升小波变换获取多尺度特征，可在捕获不同形状高频雨纹的同时保证信息的完整性。相比其他图像去雨相关的Transformer，WFDST-Net具有更低的参数量。此外，提出VOCRain250数据集用于联合图像去雨和语义分割任务，该数据集比目前广泛使用的BDD150更具优势。实验表明，所提方法增强了视觉Transformer对不同频域信息的捕获能力，并在合成和真实数据集以及VOCRain250中的表现优于目前先进的去雨方法，能有效去除复杂雨纹并保留更多细节特征。

关键词: 图像去雨, Transformer, 自注意力, 不可分提升小波, 频域

Abstract: In view of the weak ability of vision Transformer (ViT) to capture high-frequency information and the problem that many image deraining methods are prone to lose details, a wavelet frequency division self-attention Transformer image deraining network (WFDST-Net) is proposed. As the main module of WFDST-Net, the wavelet frequency division self-attention Transformer (WFDST) uses non-separable lifting wavelet transform to obtain the low-frequency and high-frequency components of feature map, and carries out self-attention interaction in the low frequency and high frequency respectively, so that the module can learn from the low frequency to restore the overall structure, and strengthen the ability to capture line details such as rain streaks in the high frequency, thus enhancing the modeling ability of different frequency domain features. WFDST-Net adopts U-shaped architecture and obtains multi-scale features through non-separable lifting wavelet transform, which can capture high-frequency rain streaks of different shapes while ensuring the integrity of information. WFDST-Net has lower parameters than other Transformers related to image deraining. In addition, the VOCRain250 dataset is proposed for the task of joint image deraining and semantic segmentation, which has advantages over the currently widely used BDD150. The experimental results show that the proposed method enhances the ability of ViT to capture different frequency domain information, and outperforms the current state-of-the-art deraining methods in the performance of synthetic and real-world datasets and joint semantic segmentation tasks. It can effectively remove complex rain streaks while retaining more background details.

Key words: image deraining, Transformer, self-attention, non-separable lifting wavelet, frequency domain

方思严, 刘斌. 小波分频自注意力Transformer图像去雨网络[J]. 计算机工程与应用, 2024, 60(6): 259-273.

FANG Siyan, LIU Bin. Wavelet Frequency Division Self-Attention Transformer Image Deraining Network[J]. Computer Engineering and Applications, 2024, 60(6): 259-273.

参考文献

[1] KANG L W, LIN C W, FU Y H. Automatic single-image-based rain streaks removal via image decomposition[J].IEEE Transactions on Image Processing, 2011, 21(4): 1742-1755.
[2] LUO Y, XU Y, JI H. Removing rain from a single image via discriminative sparse coding[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 3397-3405.
[3] LI Y, TAN R T, GUO X J, et al. Rain streak removal using layer priors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2736-2744.
[4] 刘腊梅, 王晓娜, 刘万军, 等. 融合转置卷积与深度残差图像语义分割方法[J]. 计算机科学与探索, 2022, 16(9): 2132-2142.
LIU L M, WANG X N, LIU W J, et al. Image semantic segmentation method with fusion of transposed convolution and deep residual[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2132-2142.
[5] 欧阳柳, 贺禧, 瞿绍军. 全卷积注意力机制神经网络的图像语义分割[J]. 计算机科学与探索, 2022, 16(5): 1136-1145.
OU Y L, HE X, QU S J. Fully convolutional neural network with attention module for semantic segmentation[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1136-1145.
[6] 张哲晗, 方薇, 杜丽丽, 等. 基于编码-解码卷积神经网络的遥感图像语义分割[J]. 光学学报, 2020, 40(3): 40-49.
ZHANG Z H, FANG W, DU L L, et al. Semantic segmentation of remote sensing image based on encoder-decoder convolutional neural network[J]. Acta Optica Sinica, 2020, 40(3): 40-49.
[7] WANG H, XIE Q, ZHAO Q, et al. A model-driven deep neural network for single image rain removal[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 3103-3112.
[8] WANG T Y, YANG X, XU K, et al. Spatial attentive single-image deraining with a high quality real rain dataset[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 12270-12279.
[9] WANG C, XING X Y, WU Y T, et al. DCSFN: deep cross-scale fusion network for single image rain removal[C]//Proceedings of the 28th ACM International Conference on Multimedia, 2020: 1643-1651.
[10] YANG W H, LIU J Y, YANG S, et al. Scale-free single image deraining via visibility-enhanced recurrent wavelet learning[J]. IEEE Transactions on Image Processing, 2019, 28(6): 2948-2961.
[11] ZHAO J, XIE J Y, XIONG R Q, et al. Pyramid convolutional network for single image deraining[C]//CVPR Workshops, 2019: 9-16.
[12] YI Q S, LI J C, DAI Q Y, et al. Structure-preserving deraining with residue channel prior guidance[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 4238-4247.
[13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[14] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16 × 16 words: transformers for image recognition at scale[EB/OL].(2021-06-03)[2022-09-20]. https://arxiv.org/pdf/2010.11929.pdf.
[15] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[16] LIU Z, HU H, LIN Y T, et al. Swin transformer v2: scaling up capacity and resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12009-12019.
[17] CHEN H T, WANG Y H, GUO T Y, et al. Pre-trained image processing transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 12299-12310.
[18] LIANG J Y, CAO J Z, SUN G L, et al. SwinIR: image restoration using swin transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 1833-1844.
[19] WANG Z D, CUN X D, BAO J M, et al. Uformer: a general u-shaped transformer for image restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 17683-17693.
[20] ZAMIR S W, ARORA A, KHAN S, et al. Restormer: efficient transformer for high-resolution image restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5728-5739.
[21] XIAO J, FU X Y, LIU A P, et al. Image de-raining transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 12978-12995.
[22] PARK N, KIM S. How do vision transformers work?[EB/OL].(2022-06-08)[2022-09-20]. https://arxiv.org/pdf/2202. 06709.pdf.
[23] SI C Y, YU W H, ZHOU P, et al. Inception Transformer [EB/OL].(2022-05-26)[2022-09-20]. https://arxiv.org/pdf/2205.12956.pdf.
[24] LIU B, LIU W. The lifting factorization of 2D 4-channel nonseparable wavelet transforms[J]. Information Sciences, 2018, 456: 113-130.
[25] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[26] JIANG K, WANG Z Y, YI P, et al. Multi-scale progressive fusion network for single image deraining[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8346-8355.
[27] 刘斌, 彭嘉雄. 基于四通道不可分加性小波的多光谱图像融合[J]. 计算机学报, 2009, 32(2): 350-356.
LIU B, PENG J X. Fusion method of multi-spectral image and panchromatic image based on four channels non-sperable additive wavelets[J]. Chinese Journal of Computers, 2009, 32(2): 350-356.
[28] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[29] YANG W H, TAN R T, FENG J S, et al. Deep joint rain detection and removal from a single image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1357-1366.
[30] ZHANG H, PATEL V M. Density-aware single image de-raining using a multi-stream dense network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 695-704.
[31] ZHANG H, SINDAGI V, PATEL V M. Image de-raining using a conditional generative adversarial network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(11): 3943-3956.
[32] VICENTE S, CARREIRA J, AGAPITO L, et al. Reconstructing pascal voc[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 41-48.
[33] KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL].(2017-01-30)[2022-09-20].https://arxiv.org/pdf/1412.6980.pdf.
[34] LI X, WU J L, LIN Z C, et al. Recurrent squeeze-and-excitation context aggregation net for single image deraining[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 254-269.
[35] REN D W, ZUO W M, HU Q H, et al. Progressive image deraining networks: a better and simpler baseline[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3937-3946.
[36] REN D W, SHANG W, ZHU P F, et al. Single image deraining using bilateral recurrent network[J]. IEEE Transactions on Image Processing, 2020, 29: 6852-6863.
[37] GUO Q, SUN J Y, JUEFEI-XU F, et al. Efficientderain: learning pixel-wise dilation filtering for high-efficiency single-image deraining[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 1487-1495.
[38] CUI X, WANG C, REN D W, et al. Semi-supervised image deraining using knowledge distillation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(12): 8327-8341.
[39] LI Y Z, MONNO Y, OKUTOMI M. Single image deraining network with rain embedding consistency and layered LSTM[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022: 4060-4069.
[40] SANDLER M, HOWARD A, ZHU M L, et al. Mobilenetv2: inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
[41] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 801-818.

编辑推荐 0

Metrics

阅读次数

全文

123

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	123

	来源	本网站

	次数	123
	比例	100%

摘要

163

最新录用	在线预览	正式出版

0	0	163

	来源	本网站

	次数	163
	比例	100%