Remote Sensing Image Semantic Segmentation Network Based on Multimodal Feature Fusion

doi:10.3778/j.issn.1002-8331.2207-0010

Abstract

Abstract: Semantic segmentation of remote sensing images refers to the process of forming a segmentation map by semantically labeling each pixel on a remote sensing image, and it has a wide range of applications in land and resource planning, smart city and other fields. High-resolution remote sensing images have problems such as different target size and scale and shadow occlusion, and it is difficult to segment similar objects and shadow occluded objects in a single mode. Aiming at the above problems, a remote sensing image semantic segmentation network MMFNet is proposed, which fuses IRRG（infrared, red, green） images with DSM（digital surface model） images. The network adopts an encoder-decoder structure, and the encoding layer uses a dual-input stream to extract the spectral features of IRRG images and the height features of DSM images simultaneously. The decoder uses the residual decoding block（RDB） to extract the fused features, and uses dense connections to enhance feature propagation and multiplexing. A complex atrous spatial pyramid pooling（CASPP） module is proposed to extract skip-connected multi-scale features. Experiments are conducted on the Vaihingen and Potsdam datasets provided by the international society for photogrammetry and remote sensing（ISPRS）, and MMFNet achieves global accuracy of 90.44% and 90.70%, respectively, compared with DeepLabV3+, OCRNet Equal general segmentation network and CEVO, UFMG_4 equivalent dataset dedicates segmentation network have higher segmentation accuracy.

Key words: high-resolution remote sensing imagery, semantic segmentation, multiscale features, pyramid pooling

摘要： 遥感图像语义分割是指通过对遥感图像上每个像素分配语义标签并标注，从而形成分割图的过程，在国土资源规划、智慧城市等领域有着广泛的应用。高分辨率遥感图像存在目标大小尺度不一与阴影遮挡等问题，单一模态下对相似地物和阴影遮挡地物分割较为困难。针对上述问题，提出了将IRRG（infrared、red、green）图像与DSM（digital surface model）图像融合的遥感图像语义分割网络MMFNet。网络采用编码器-解码器的结构，编码层采用双输入流的方式同时提取IRRG图像的光谱特征和DSM图像的高度特征。解码器使用残差解码块（residual decoding block，RDB）提取融合后的特征，并使用密集连接的方式加强特征的传播和复用。提出复合空洞空间金字塔（complex atrous spatial pyramid pooling，CASPP）模块提取跳跃连接的多尺度特征。在国际摄影测量与遥感学会（international society for photogrammetry and remote sensing，ISPRS）提供的Vaihingen和Potsdam数据集上进行了实验，MMFNet分别取得了90.44%和90.70%的全局精确度，相比较与DeepLabV3+、OCRNet等通用分割网络和CEVO、UFMG_4等同数据集专用分割网络具有更高的分割精确度。

关键词: 高分辨率遥感图像, 语义分割, 多尺度特征, 金字塔池化

SUN Hanqi, PAN Chen, HE Lingmin, XU Zhijie. Remote Sensing Image Semantic Segmentation Network Based on Multimodal Feature Fusion[J]. Computer Engineering and Applications, 2022, 58(24): 256-264.

孙汉淇, 潘晨, 何灵敏, 胥智杰. 多模态特征融合的遥感图像语义分割网络[J]. 计算机工程与应用, 2022, 58(24): 256-264.

References

[1] 孙显，孟瑜，刁文辉，等.智能遥感：AI赋能遥感技术[J].中国图象图形学报，2022，27（6）：1799-1822.
SUN X，MENG Y，DIAO W H，et al.The review of AI-based intelligent remote sensing capabilities[J].Journal of Image and Graphics，2022，27（6）：1799-1822.
[2] 徐辉，祝玉华，甄彤，等.深度神经网络图像语义分割方法综述[J].计算机科学与探索，2021，15（1）：47-59.
XU H，ZHU Y H，ZHEN P，et al.Survey of image semantic segmentation methods based on deep neural network[J].Journal of Frontiers of Computer Science and Technology，2021，15（1）：47-59.
[3] 马宇，张丽果，杜慧敏，等.卷积神经网络的交通标志语义分割[J].计算机科学与探索，2021，15（6）：1114-1121.
MA Y，ZHANG L G，DU H M，et al.Traffic sign semantic segmentation based on convolutional neural network[J].Journal of Frontiers of Computer Science and Technology，2021，15（6）：1114-1121.
[4] 蒯宇，王彪，吴艳兰，等.基于多尺度特征感知网络的城市植被无人机遥感分类[J].地球信息科学学报，2022，24（5）：962-980.
KUAI Y，WANG B，WU Y L，et al.Urban vegetation classification based on multi-scale feature perception network for UAV images[J].Journal of Geo-Information Science，2022，24（5）：962-980.
[5] LONG J，SHELHAMER E，DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：3431-3440.
[6] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2881-2890.
[7] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Deeplab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，40（4）：834-848.
[8] CHEN L C，ZHU Y，PAPANDREOU G，et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision，2018：801-818.
[9] YANG M，YU K，ZHANG C，et al.Denseaspp for semantic segmentation in street scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：3684-3692.
[10] SUN K，XIAO B，LIU D，et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：5693-5703.
[11] SUN K，ZHAO Y，JIANG B，et al.High-resolution representations for labeling pixels and regions[J].arXiv：1904.
04514，2019.
[12] YUAN Y H，CHEN X K，CHEN X L，et al.Segmentation transformer：object-contextual representations for semantic segmentation[J].arXiv：1909.11065，2019.
[13] RONNEBERGER O，FISCHER P，BROX T.U-net：convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham：Springer，2015：234-241.
[14] LIN G，MILAN A，SHEN C，et al.Refinenet：multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1925-1934.
[15] ELHASSAN M A，YANG C，HUANG C，et al.SPFNet：subspace pyramid fusion network for semantic segmentation[J].arXiv：2204.01278，2022.
[16] GAO S H，CHENG M M，ZHAO K，et al.Res2net：a new multi-scale backbone architecture[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2019，43（2）：652-662.
[17] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[18] YAN M，WANG J，LI J，et al.Traffic scene semantic segmentation using self-attention mechanism and bi-directional GRU to correlate context[J].Neurocomputing，2020，386：293-304.
[19] ROY A G，NAVAB N，WACHINGER C.Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham：Springer，2018：421-429.
[20] LOSHCHILOV I，HUTTER F.Sgdr：stochastic gradient descent with warm restarts[J].arXiv：1608.03983，2016.
[21] VOLPI M，TUIA D.Dense semantic labeling of subdecimeter resolution images with convolutional neural networks[J].IEEE Transactions on Geoscience and Remote Sensing，2016，55（2）：881-893.
[22] CHEN G，ZHANG X，WANG Q，et al.Symmetrical dense-shortcut deep fully convolutional networks for semantic segmentation of very-high-resolution remote sensing images[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing，2018，11（5）：1633-1644.
[23] NOGUEIRA K，DALLA MURA M，CHANUSSOT J，et al.Dynamic multicontext segmentation of remote sensing images based on convolutional networks[J].IEEE Transactions on Geoscience and Remote Sensing，2019，57（10）：7503-7520.