Improved SegFormer Network Based Method for Semantic Segmentation of Remote Sensing Images

doi:10.3778/j.issn.1002-8331.2204-0141

Abstract

Abstract: Existing segmentation algorithms have difficulties to accurately segment small objects and object boundaries on remote sensing images, due to the multiple object scales and insufficient semantic information of small objects on remote sensing images. Therefore, an improved SegFormer network semantic segmentation method for remote sensing images is proposed, which combines the features of multiple scales output by the SegFormer encoder in a cascaded manner. When merging high-level semantic information features, the semantic feature fusion module is used to preserve the fuzzy boundaries; when merging detailed information features, the gated attention mechanism module is used to filter some high-level semantic information features to reduce their interference to the detailed information features. After that, the features of multiple scales are up-sampled and connected, and the multi-local channel attention module is used to recalibrate the mapping relationship of the connected features according to the channel context to enhance the final segmentation effect. The experimental results on UAVid and ISPRS Potsdam datasets show that the improved SegFormer segmentation method is better than the current mainstream segmentation methods compared, and has better semantic segmentation effect on small objects and boundaries in remote sensing images.

Key words: remote sensing image, semantic segmentation, feature fusion, gated-attention, multi-local channels attention

摘要： 由于遥感图像存在目标尺度多、小目标的语义信息不足等问题，现有算法对遥感图像中小目标和目标边界难以精准分割。为此提出了一种改进SegFormer网络的遥感图像语义分割方法，以级联的方式合并SegFormer编码器输出的多个尺度的特征。在合并高层语义信息特征时使用语义特征融合模块保留模糊边界；在合并细节信息特征时使用门控注意力机制模块过滤部分高层语义信息特征，减少其对细节信息特征的干扰。之后将多个尺度的特征上采样后连接，使用多局部通道注意力模块根据通道上下文关系重新校准连接特征的映射关系，增强最终的分割效果。在UAVid和ISPRS Potsdam数据集上的实验结果表明，改进SegFormer的分割方法优于比较的当前主流分割方法，对遥感图像中的小目标和边界有更好的语义分割效果。

关键词: 遥感图像, 语义分割, 特征融合, 门控注意力, 多局部通道注意力

TIAN Xuewei, WANG Jiali, CHEN Ming, DU Shouqing. Improved SegFormer Network Based Method for Semantic Segmentation of Remote Sensing Images[J]. Computer Engineering and Applications, 2023, 59(8): 217-226.

田雪伟, 汪佳丽, 陈明, 杜守庆. 改进SegFormer网络的遥感图像语义分割方法[J]. 计算机工程与应用, 2023, 59(8): 217-226.

References

[1] 廖小罕，肖青，张颢.无人机遥感：大众化与拓展应用发展趋势[J].遥感学报，2019，23（6）：1046-1052.
LIAO X H，XIAO Q，ZHANG H.UAV remote sensing：popularization and expand application development trend[J].Journal of Remote Sensing，2019，23（6）：1046-1052.
[2] LV Q，DOU Y，NIU X，et al.Urban land use and land cover classification using remotely sensed SAR data through deep belief networks[J].Journal of Sensors，2015：538063.
[3] YANG Q C，LIU M，ZHANG Z T，et al.Mapping plastic mulched farmland for high resolution images of unmanned aerial vehicle using deep semantic segmentation[J].Remote Sensing，2019，11（17）：2008-2023.
[4] PI Y L，NATH N D，BEHZADAN A H，et al.Detection and semantic segmentation of disaster damage in UAV footage[J].Journal of Computing in Civil Engineering，2021，35（2）：1-19.
[5] GUO Y T，LONG T F，JIAO W L，et al.Siamese detail difference and self-inverse network for forest cover change extraction based on Landsat 8 OLI satellite images[J].Remote Sensing，2022，14（3）：627-646.
[6] 徐辉，祝玉华，甄彤，等.深度神经网络图像语义分割方法综述[J].计算科学与探索，2021，15（1）：47-59.
XU H，ZHU Y H，ZHEN T，et al.Survey of image semantic segmentation methods based on deep neural network[J].Journal of Frontiers of Computer Science and Technology，2021，15（1）：47-59.
[7] YAMASHITA R，NISHIO M，DO R K G，et al.Convolutional neural networks：an overview and application in radiology[J].Insights Imaging，2018，9（4）：611-629.
[8] ZEILER M D，FERGUS R.Visualizing and understanding convolutional networks[C]//2014 13th European Conference on Computer Vision（ECCV），Zurich，September 5-12，2014.Cham：Springer，2014：818-833.
[9] LONG J，SHELHAMER E，DARRELL T，et al.Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Boston，June 7-12，2015.New York：IEEE Press，2015：3431-3440.
[10] JIANG B D，AN X Y，XU S F，et al.Intelligent image semantic segmentation：a review through deep learning techniques for remote sensing image analysis[J].Journal of the Indian Society of Remote Sensing，2022：1-14.
[11] RONNEBERGER O，FISCHER P，BROX T.U-net：convolutional networks for biomedical image segmentation[C]//2015 International Conference on Medical Image Computing and Computer-assisted Intervention，Munich，October 5-9，2015.Cham：Springer，2015：234-241.
[12] BADRINARAYANAN V，KENDALL A，CIPOLLA R.Segnet：a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（12）：2481-2495.
[13] ZHAO H S，QI X J，SHEN X Y，et al.Icnet for real-time semantic segmentation on high-resolution images[C]//2018 15th European Conference on Computer Vision（ECCV），Munich，September 8-14，2018.Cham：Springer，2018：405-420.
[14] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Deeplab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，40（4）：834-848.
[15] 熊风光，张鑫，韩燮，等.改进的遥感图像语义分割研究[J].计算机工程与应用，2022，58（8）：185-190.
XIONG F G，ZHANG X，HAN X，et al.Research on improved semantic segmentation of remote sensing[J].Computer Engineering and Applications，2022，58（8）：185-190.
[16] CHEN L C，ZHU Y，PAPANDREOU G，et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//2018 15th European Conference on Computer Vision（ECCV），Munich，September 8-14，2018.Cham：Springer，2018：801-818.
[17] FU J，LIU J，TIAN H J，et al.Dual attention network for scene segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），Long Beach，June 15-20，2019.New York：IEEE Press，2019：3141-3149.
[18] WOO S Y，PARK J C，LEE J Y，et al.CBAM：convolutional block attention module[C]//2018 15th European Conference on Computer Vision（ECCV），Munich，September 8-14，2018.Cham：Springer，2018：3-19.
[19] YIN M H，YAO Z L，CAO Y，et al.Disentangled non-local neural networks[C]//2020 16th European Conference on Computer Vision（ECCV），Glasgow，August 23-28，2020.Cham：Springer，2020：191-207.
[20] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Advances in Neural Information Processing Systems，2017：1-10.
[21] 刘文婷，卢新明.基于计算机视觉的Transformer研究进展[J].计算机工程与应用，2022，58（6）：1-16.
LIU W T，LU X M.Research progress of transformer based on computer vision[J].Computer Engineering and Applications，2022，58（6）：1-16.
[22] DOSOVITSKIY A，BEYER L，KOLESNIKOV A，et al.An image is worth 16×16 words：transformers for image recognition at scale[J].arXiv：2010.11929，2020.
[23] ZHENG S，LU J，ZHAO H，et al.Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），June 19-25，2021.New York：IEEE Press，2021：6881-6890.
[24] XIE E Z，WANG W H，YU Z D，et al.SegFormer：simple and efficient design for semantic segmentation with transformers[C]//Advances in Neural Information Processing Systems，2021.
[25] ZHOU G B，WU J X，ZHANG C L，et al.Minimal gated unit for recurrent neural network[J].International Journal of Automation and Computing，2016，13（3）：226-234.
[26] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//2018 IEEE conference on Computer Vision and Pattern Recognition（CVPR），Salt Lake City，June 18-22，2018.New York：IEEE Press，2018：7132-7141.
[27] WANG Q L，WU B G，ZHU P F，et al.ECA-Net：efficient channel attention for deep convolutional neural networks[C]//2020 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Seattle，June 13-19，2020.New York：IEEE Press，2020：11531-11539.
[28] LYU Y，VOSSELMAN G，XIA G S，et al.UAVid：a semantic segmentation dataset for UAV imagery[J].ISPRS Journal of Photogrammetry and Remote Sensing，2020，165：108-119.
[29] 2D semantic labeling contest-potsdam[EB/OL].（2022-02-08）[2022-03-27].https：//www2.isprs.org/commissions/comm2/wg4/benchmark/2d-sem-label-potsdam.aspx.
[30] DENG J，DONG W，SOCHER R，et al.Imagenet：a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Miami，June 20-25，2009.New York：IEEE Press，2009：248-255.
[31] HE K M，ZHANG X Y，REN S Q，et al.Delving deep into rectifiers：surpassing human-level performance on imagenet classification[C]//2015 IEEE International Conference on Computer Vision（ICCV），Santiago，December 13-16，2015.New York：IEEE Press，2015：1026-1034.
[32] ZHAO H S，SHI J P，QI X J，et al.Pyramid scene parsing network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Honolulu，July 21-26，2017.New York：IEEE Press，2017：6230-6239.
[33] HE K M，ZHANG X Y，REN S Q，et al.Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Las Vegas，June 26-July 1，2016.New York：IEEE Press，2016：770-778.