改进的SegFormer遥感图像语义分割网络

doi:10.3778/j.issn.1002-8331.2307-0167

摘要/Abstract

摘要： 随着遥感技术的发展，遥感图像的语义分割在城乡资源管理、城乡规划等领域有着更为广泛的应用。因为小型无人机在遥感数据采集方面具有成本效益、灵活性和操作便捷等优势，所以使用无人机拍摄图像已经成为收集遥感图像数据集的首选方法。由于小型无人机低空斜角拍摄的特性，相较于传统遥感拍摄设备获取的图片，无人机图片目标细节信息更加丰富、目标关系更加复杂的特性导致基于局部卷积的传统深度学习模型无法再胜任此项工作。针对上述问题，提出了基于SegFormer的改进遥感图像语义分割网络。基于SegFormer，在编码层额外添加轮廓提取模块（edge contour extraction module，ECEM）辅助模型提取目标的浅层特征。鉴于城市遥感图像建筑物居多的特点，在编码层额外添加使用多尺度条纹池化（multi-scale strip pooling，MSP）替换全局平均池化的多尺度空洞空间卷积池化金字塔（multi-scale atrous spatial pyramid pooling，MSASPP）模块来提取图像中的长条状目标特征。针对原始解码器操作不利于特征信息还原的缺点，参考U-Net网络解码层的结构，将编码层接收到的特征融合之后再执行上采样提取以及SE通道注意力操作，以此加强特征的传播和融合。改进网络在国际摄影测量与遥感学会（International Society for Photogrammetry and Remote Sensing，ISPRS）提供的Vaihingen和无人机遥感图像语义分割数据集UAVid上进行了实验，网络分别取得了90.30%和77.90%的平均交并比（mean intersection over union，MIoU），比DeepLabV3+、Swin-Unet等通用分割网络具有更高的分割精确度。

关键词: 高分辨率遥感图像, 语义分割, 多尺度条纹池化, 轮廓提取模块

Abstract: With the development of remote sensing technology, semantic segmentation of remote sensing images has also been widely applied in urban and rural resource management, urban and rural planning, and other fields. Due to its advantages of cost effectiveness, flexibility and convenience of operation in remotely sensed data acquisition, using a small UAV to take images has become the preferred method for collecting remotely sensed image datasets. Due to the special properties of small UAVs （autonomous vehicles） during low-altitude, oblique photography, UAV image feature detail is more rich, while its relationship to target is more complex compared to the traditional remote sensing equipment. Therefore, traditional deep learning models based on local convolution can no longer fulfill this task. In response to the above issues, an improved remote sensing image semantic segmentation network based on SegFormer is proposed. Based on SegFormer, an additional edge contour extraction module（ECEM） is added to the coding layer to assist the model in extracting shallow features of the target. Due to the predominance of buildings in urban remote sensing images, an additional multi-scale atrus spatial pyramid pooling（MSASPP） module is added to the encoding layer to replace the global average pooling with multi-scale strip pooling（MSP） to extract the features of elongated targets in the image. In response to the drawback of the original decoder operation that is unfavorable for feature information restoration, refer to the structure of the U-Net network decoding layer, the features received by the encoding layer are fused before performing upsampling extraction and SE channel attention operations to strengthen feature propagation and fusion. The improved network is tested on the Vaihingen and UAVid remote sensing image semantic segmentation dataset provided by the International Society for Photogrammetry and Remote Sensing（ISPRS）. The network achieves 90.30% and 77.90% mean intersection over union（MIoU）, respectively, with higher segmentation accuracy than general segmentation networks such as DeepLabV3+ and Swin-Unet.

Key words: high-resolution remote sensing imagery, semantic segmentation, multi-scale strip pooling, edge contour extraction module

张昊, 何灵敏, 潘晨. 改进的SegFormer遥感图像语义分割网络[J]. 计算机工程与应用, 2023, 59(24): 248-258.

ZHANG Hao, HE Lingmin, PAN Chen. Improved SegFormer Remote Sensing Image Semantic Segmentation Network[J]. Computer Engineering and Applications, 2023, 59(24): 248-258.

参考文献

[1] 孙显，孟瑜，刁文辉，等.智能遥感：AI赋能遥感技术[J].中国图象图形学报，2022，27（6）：1799-1822.
SUN X，MENG Y，DIAO W H，et al.The review of AI-based intelligent remote sensing capabilities[J].Journal of Image and Graphics，2022，27（6）：1799-1822.
[2] 徐辉，祝玉华，甄彤，等.深度神经网络图像语义分割方法综述[J].计算机科学与探索，2021，15（1）：47-59.
XU H，ZHU Y H，ZHEN T，et al.Survey of image semantic segmentation methods based on deep neural network[J].Journal of Frontiers of Computer Science and Technology，2021，15（1）：47-59.
[3] 廖小罕，肖青，张颢.无人机遥感：大众化与拓展应用发展趋势[J].遥感学报，2019，23（6）：1046-1052.
LIAO X H，XIAO Q，ZHANG H.UAV remote sensing：popularization and expand application development trend[J].Journal of Remote Sensing，2019，23（6）：1046-1052.
[4] LV Q，DOU Y，NIU X，et al.Urban land use and land cover classification using remotely sensed SAR data through deep belief networks[J].Journal of Sensors，2015：538063.
[5] YANG Q C，LIU M，ZHANG Z T，et al.Mapping plastic mulched farmland for high resolution images of unmanned aerial vehicle using deep semantic segmentation[J].Remote Sensing，2019，11（17）：2008-2023.
[6] PI Y L，NATH N D，BEHZADAN A H，et al.Detection and semantic segmentation of disaster damage in UAV footage[J].Journal of Computing in Civil Engineering，2021，35（2）：04020063.
[7] GUO Y T，LONG T F，JIAO W L，et al.Siamese detail difference and self-inverse network for forest cover change extraction based on Landsat 8 OLI satellite images[J].Remote Sensing，2022，14（3）：627-646.
[8] 马宇，张丽果，杜慧敏，等.卷积神经网络的交通标志语义分割[J].计算机科学与探索，2021，15（6）：1114-1121.
MA Y，ZHANG L G，DU H M，et al.Traffic sign semantic segmentation based on convolutional neural network[J].Journal of Frontiers of Computer Science and Technology，2021，15（6）：1114-1121.
[9] YAMASHITA R，NISHIO M，DO R K G，et al.Convolutional neural networks：an overview and application in radiology[J].Insights Into Imaging，2018，9（4）：611-629.
[10] 蒯宇，王彪，吴艳兰，等.基于多尺度特征感知网络的城市植被无人机遥感分类[J].地球信息科学学报，2022，24（5）：962-980.
KUAI Y，WANG B，WU Y L，et al.Urban vegetation classification based on multi-scale feature perception network for UAV images[J].Journal of Geo-Information Science，2022，24（5）：962-980.
[11] LONG J，SHELHAMER E，DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：3431-3440.
[12] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2881-2890.
[13] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Deeplab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，40（4）：834-848.
[14] CHEN L C，ZHU Y，PAPANDREOU G，et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：801-818.
[15] DOSOVITSKIY A，BEYER L，KOLESNIKOV A，et al.An image is worth 16×16 words：transformers for image recognition at scale[J].arXiv：2010.11929，2020.
[16] ZHENG S，LU J，ZHAO H，et al.Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），June 19-25，2021.New York：IEEE Press，2021：6881-6890.
[17] XIE E Z，WANG W H，YU Z D，et al.SegFormer：simple and efficient design for semantic segmentation with transformers[J].arXiv：2105.15203，2021.
[18] WANG L，LI R，DUAN C，et al.A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images[J].IEEE Geoscience and Remote Sensing Letters，2022，19：1-5.
[19] GAO L，LIU H，YANG M，et al.STransFuse：fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing，2021，14：10990-11003.
[20] 田雪伟，汪佳丽，陈明，等.改进SegFormer网络的遥感图像语义分割方法[J].计算机工程与应用，2023，59（8）：217-226.
TIAN X W，WANG J L，CHEN M，et al.Improved SegFormer network based method for semantic segmentation of remote sensing images[J].Computer Engineering and Applications，2023，59（8）：217-226.
[21] LYU Y，VOSSELMAN G，XIA G S，et al.UAVid：a semantic segmentation dataset for UAV imagery[J].ISPRS Journal of Photogrammetry and Remote Sensing，2020，165：108-119.
[22] RONNEBERGER O，FISCHER P，BROX T.U-net：convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham：Springer，2015：234-241.
[23] HOU Q，ZHANG L，CHENG M M，et al.Strip pooling：rethinking spatial pooling for scene parsing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：4003-4012.
[24] LU C，XIA M，LIN H.Multi-scale strip pooling feature aggregation network for cloud and cloud shadow segmentation[J].Neural Computing and Applications，2022，34：6149-6162.
[25] LI R，ZHENG S，ZHANG C，et al.ABCNet：attentive bilateral contextual network for efficient semantic segmentation of fine-resolution remotely sensed imagery[J].ISPRS Journal of Photogrammetry and Remote Sensing，2021，181：84-98.
[26] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[27] WANG P，CHEN P，YUAN Y，et al.Understanding convolution for semantic segmentation[C]//2018 IEEE Winter Conference on Applications of Computer Vision（WACV），2018：1451-1460.
[28] CAO H，WANG Y，CHEN J，et al.Swin-unet：unet-like pure transformer for medical image segmentation[C]//European Conference on Computer Vision.Cham：Springer Nature Switzerland，2022：205-218.