多模态融合的遥感图像语义分割网络

doi:10.3778/j.issn.1002-8331.2311-0400

摘要/Abstract

摘要： 多模态语义分割网络可以利用不同模态中的互补数据提升分割精度，但现有的多模语义分割模型多为将两种模态数据简单拼接起来，忽略了不同模态数据在高低频中的特征特性，导致跨模态特征提取不充分，融合不理想。针对上述问题，提出了将IRRG图像与DSM图像融合的遥感图像语义分割网络LHFNet（low feature and high feature fusion network）。针对各模态图像低频相关的结构特征，设计了低级特征提取加强模块，以加强不同模态特征的提取；基于各模态图像高频相互独立的细节特征，设计了高级特征融合模块，以指导不同模态的特征融合；针对高低频图像特征之间的语义鸿沟，设计了全局空洞空间金字塔池化模块跳跃连接高低频信息，以增强高低频图像特征之间的信息交互。通过基于ISPRS提供的Vaihingen和Potsdam数据集上的实验表明，LHFNet分别取得了88.17%和90.53%的全局精确度，相较于SegNet、DeepLabv3+等单模分割网络和RedNet、TSNet等多模RGB-D分割网络都具有更高的分割精确度。

关键词: 多模语义分割, 特征融合, 特征提取, 多源遥感图像

Abstract: Multimodal semantic segmentation networks can utilize complementary data in different modalities to improve segmentation accuracy. However, existing multimodal semantic segmentation models often combine two types of modal data with simple way, neglecting the information features in the high and low frequencies of different modal data, leading to insufficient cross-modal feature extraction and suboptimal fusion. To address these issues, this paper proposes a remote sensing image semantic segmentation network, LHFNet (low feature and high feature fusion network), which fuses IRRG images with DSM images. Firstly, for the structure features related to the low frequency of each modal image, a low-level feature extraction enhancement module is designed to strengthen the extraction of different modal features. Secondly, based on the detail features that are independent of each other in the high frequency of each modal image, a high-level feature fusion module is designed to guide the fusion of different modal features. Finally, for the semantic gap between the high and low frequency image features, a global atrous spatial pyramid pooling module is designed to skip the connection of high and low frequency information, to enhance the information interaction between the high and low frequency image features. Experiments on the Vaihingen and Potsdam datasets provided by ISPRS show that LHFNet has achieved global accuracies of 88.17% and 90.53% respectively, which are higher than the segmentation accuracies of single-mode segmentation networks such as SegNet, DeepLabv3+, and multimodal RGB-D segmentation networks such as RedNet, TSNet.

Key words: multimodal semantic segmentation, feature fusion, feature extraction, multi-source remote sensing images

胡宇翔, 余长宏, 高明. 多模态融合的遥感图像语义分割网络[J]. 计算机工程与应用, 2024, 60(15): 234-242.

HU Yuxiang, YU Changhong, GAO Ming. Remote Sensing Image Semantic Segmentation Network Based on Multimodal Fusion[J]. Computer Engineering and Applications, 2024, 60(15): 234-242.

参考文献

[1] 孙显, 孟瑜, 刁文辉, 等. 智能遥感: AI赋能遥感技术[J]. 中国图象图形学报, 2022, 27(6): 1799-1822.
SUN X, MENG Y, DIAO W H, et al. The review of AI-based intelligent remote sensing capabilities[J]. Journal of Image and Graphics, 2022, 27(6): 1799-1822.
[2] NATIVI S, MAZZETTI P, SANTORO M, et al. Big data challenges in building the global earth observation system of systems[J]. Environmental Modelling & Software, 2015, 68: 1-26.
[3] JU Y, ZHANG Y, CHEN D. A SAR image segmentation method based on MLRT[C]//2020 5th International Conference on Communication, Image, and Signal Processing (CCISP), 2020: 179-182.
[4] MANJU S, HELENPRABHA K. A structured support vector machine for hyperspectral satellite image segmentation and classification based on modified swarm optimization approach[J]. Journal of Ambient Intelligence and Humanized Computing, 2021, 12: 3659-3668.
[5] HAMADA M A, KANAT Y, ABICHE E. Multi-spectral image segmentation based on the K-means clustering[J]. International Journal of Innovative Technology and Exploring Engineering, 2019, 9(2): 2278-3075.
[6] 厍向阳, 马亦骏. 改进的遥感图像语义分割算法[J]. 计算机工程与科学, 2023, 45(3): 504-511.
SHE X Y, MA Y J. An improved semantic segmentation algorithm for remote sensing images[J]. Computer Engineering & Science, 2023, 45(3): 504-511.
[7] 何坚, 郭泽龙, 刘乐园, 等. 基于滑动窗口和卷积神经网络的可穿戴人体活动识别技术[J]. 电子与信息学报, 2022, 44(1): 168-177.
HE J, GUO Z L, LIU L Y, et al. Human activity recognition technology based on sliding window and convolutional neural network[J]. Journal of Electronics & Information Technology, 2022, 44(1): 168-177.
[8] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3431-3440.
[9] RONNERBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[10] BADRINARAYANAN V, KENDALL A, CIPOLLA R. Segnet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[11] ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2881-2890.
[12] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. arXiv:1412.7062, 2014.
[13] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[14] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv:1706.05587, 2017.
[15] CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 801-818.
[16] XU Z, ZHANG W, ZHANG T, et al. HRCNet: high-resolution context extraction network for semantic segmentation of remote sensing images[J]. Remote Sensing, 2020, 13(1): 71.
[17] LI R, ZHENG S, ZHANG C, et al. Multiattention network for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-13.
[18] 孙汉淇, 潘晨, 何灵敏, 等. 多模态特征融合的遥感图像语义分割网络[J]. 计算机工程与应用, 2022, 58(24): 256-264.
SUN H Q, PAN C, HE L M, et al. Remote sensing image semantic segmentation network based on multimodal feature fusion[J]. Computer Engineering and Applications, 2022, 58(24): 256-264.
[19] BITTNER K, ADAM F, CUI S, et al. Building footprint extraction from VHR remote sensing images combined with normalized DSMs using fused fully convolutional networks[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018, 11(8): 2615-2629.
[20] PARK S J, HONG K S, LEE S. RdfNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 4980-4989.
[21] JIANG J, ZHENG L, LUO F, et al. RedNet: residual encoder-decoder network for indoor RGB-D semantic segmentation[J]. arXiv:1806.01054, 2018.
[22] CHITTA K, ALVAREZ J M, HEBERT M. Quadtree generating networks: efficient hierarchical scene parsing with sparse convolutions[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020: 2020-2029.
[23] ZHOU W, YUAN J, LEI J, et al. TSNet: three-stream self-attention network for RGB-D indoor semantic segmentation[J]. IEEE Intelligent Systems, 2020, 36(4): 73-78.
[24] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[25] 马妍, 古丽米拉·克孜尔别克. 图像语义分割方法在高分辨率遥感影像解译中的研究综述[J]. 计算机科学与探索, 2023, 17(7): 1526-1548.
MA Y, GULIMILA K. Research review of lmage semantic segmentation method in high-resolution remote sensing lmage interpretation[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(7): 1526-1548.
[26] 徐光宪, 冯春, 马飞. 基于UNet的医学图像分割综述[J]. 计算机科学与探索, 2023, 17(8): 1776-1792.
XU G, FENG C, MA F. Review of medical lmage segmentation based on UNet[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(8): 1776-1792.
[27] GERKE M. Use of the stair vision library within the ISPRS 2D semantic labeling benchmark (Vaihingen)[R/OL]. ResearcheGate, 2014. https://doi.org/10.13140/2.1.5015.9683.
[28] YUE K, YANG L, LI R, et al. TreeUNet: adaptive tree convolutional neural networks for subdecimeter aerial image segmentation[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 156: 1-13.
[29] ELHASSAN M A, YANG C, HUANG C, et al. SPFNet: subspace pyramid fusion network for semantic segmentation[J]. arXiv:2204.01278, 2022.