Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (24): 256-264.DOI: 10.3778/j.issn.1002-8331.2207-0010

• Graphics and Image Processing • Previous Articles     Next Articles

Remote Sensing Image Semantic Segmentation Network Based on Multimodal Feature Fusion

SUN Hanqi, PAN Chen, HE Lingmin, XU Zhijie   

  1. 1.College of Information Engineering, China Jiliang University, Hangzhou 310018, China
    2.Key Laboratory of Electromagnetic Wave Information Technology and Metrology of Zhejiang Province, China Jiliang University, Hangzhou 310018, China
  • Online:2022-12-15 Published:2022-12-15

多模态特征融合的遥感图像语义分割网络

孙汉淇,潘晨,何灵敏,胥智杰   

  1. 1.中国计量大学 信息工程学院,杭州 310018
    2.中国计量大学 浙江省电磁波信息技术与计量检测重点实验室,杭州 310018

Abstract: Semantic segmentation of remote sensing images refers to the process of forming a segmentation map by semantically labeling each pixel on a remote sensing image, and it has a wide range of applications in land and resource planning, smart city and other fields. High-resolution remote sensing images have problems such as different target size and scale and shadow occlusion, and it is difficult to segment similar objects and shadow occluded objects in a single mode. Aiming at the above problems, a remote sensing image semantic segmentation network MMFNet is proposed, which fuses IRRG(infrared, red, green) images with DSM(digital surface model) images. The network adopts an encoder-decoder structure, and the encoding layer uses a dual-input stream to extract the spectral features of IRRG images and the height features of DSM images simultaneously. The decoder uses the residual decoding block(RDB) to extract the fused features, and uses dense connections to enhance feature propagation and multiplexing. A complex atrous spatial pyramid pooling(CASPP) module is proposed to extract skip-connected multi-scale features. Experiments are conducted on the Vaihingen and Potsdam datasets provided by the international society for photogrammetry and remote sensing(ISPRS), and MMFNet achieves global accuracy of 90.44% and 90.70%, respectively, compared with DeepLabV3+, OCRNet Equal general segmentation network and CEVO, UFMG_4 equivalent dataset dedicates segmentation network have higher segmentation accuracy.

Key words: high-resolution remote sensing imagery, semantic segmentation, multiscale features, pyramid pooling

摘要: 遥感图像语义分割是指通过对遥感图像上每个像素分配语义标签并标注,从而形成分割图的过程,在国土资源规划、智慧城市等领域有着广泛的应用。高分辨率遥感图像存在目标大小尺度不一与阴影遮挡等问题,单一模态下对相似地物和阴影遮挡地物分割较为困难。针对上述问题,提出了将IRRG(infrared、red、green)图像与DSM(digital surface model)图像融合的遥感图像语义分割网络MMFNet。网络采用编码器-解码器的结构,编码层采用双输入流的方式同时提取IRRG图像的光谱特征和DSM图像的高度特征。解码器使用残差解码块(residual decoding block,RDB)提取融合后的特征,并使用密集连接的方式加强特征的传播和复用。提出复合空洞空间金字塔(complex atrous spatial pyramid pooling,CASPP)模块提取跳跃连接的多尺度特征。在国际摄影测量与遥感学会(international society for photogrammetry and remote sensing,ISPRS)提供的Vaihingen和Potsdam数据集上进行了实验,MMFNet分别取得了90.44%和90.70%的全局精确度,相比较与DeepLabV3+、OCRNet等通用分割网络和CEVO、UFMG_4等同数据集专用分割网络具有更高的分割精确度。

关键词: 高分辨率遥感图像, 语义分割, 多尺度特征, 金字塔池化