计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (24): 248-258.DOI: 10.3778/j.issn.1002-8331.2307-0167

• 图形图像处理 • 上一篇    下一篇

改进的SegFormer遥感图像语义分割网络

张昊,何灵敏,潘晨   

  1. 1.中国计量大学 信息工程学院,杭州 310018
    2.中国计量大学 浙江省电磁波信息技术与计量检测重点实验室,杭州 310018
  • 出版日期:2023-12-15 发布日期:2023-12-15

Improved SegFormer Remote Sensing Image Semantic Segmentation Network

ZHANG Hao, HE Lingmin, PAN Chen   

  1. 1.College of Information Engineering, China Jiliang University, Hangzhou 310018, China
    2.Key Laboratory of Electromagnetic Wave Information Technology and Metrology of Zhejiang Province, China Jiliang University, Hangzhou 310018, China
  • Online:2023-12-15 Published:2023-12-15

摘要: 随着遥感技术的发展,遥感图像的语义分割在城乡资源管理、城乡规划等领域有着更为广泛的应用。因为小型无人机在遥感数据采集方面具有成本效益、灵活性和操作便捷等优势,所以使用无人机拍摄图像已经成为收集遥感图像数据集的首选方法。由于小型无人机低空斜角拍摄的特性,相较于传统遥感拍摄设备获取的图片,无人机图片目标细节信息更加丰富、目标关系更加复杂的特性导致基于局部卷积的传统深度学习模型无法再胜任此项工作。针对上述问题,提出了基于SegFormer的改进遥感图像语义分割网络。基于SegFormer,在编码层额外添加轮廓提取模块(edge contour extraction module,ECEM)辅助模型提取目标的浅层特征。鉴于城市遥感图像建筑物居多的特点,在编码层额外添加使用多尺度条纹池化(multi-scale strip pooling,MSP)替换全局平均池化的多尺度空洞空间卷积池化金字塔(multi-scale atrous spatial pyramid pooling,MSASPP)模块来提取图像中的长条状目标特征。针对原始解码器操作不利于特征信息还原的缺点,参考U-Net网络解码层的结构,将编码层接收到的特征融合之后再执行上采样提取以及SE通道注意力操作,以此加强特征的传播和融合。改进网络在国际摄影测量与遥感学会(International Society for Photogrammetry and Remote Sensing,ISPRS)提供的Vaihingen和无人机遥感图像语义分割数据集UAVid上进行了实验,网络分别取得了90.30%和77.90%的平均交并比(mean intersection over union,MIoU),比DeepLabV3+、Swin-Unet等通用分割网络具有更高的分割精确度。

关键词: 高分辨率遥感图像, 语义分割, 多尺度条纹池化, 轮廓提取模块

Abstract: With the development of remote sensing technology, semantic segmentation of remote sensing images has also been widely applied in urban and rural resource management, urban and rural planning, and other fields. Due to its advantages of cost effectiveness, flexibility and convenience of operation in remotely sensed data acquisition, using a small UAV to take images has become the preferred method for collecting remotely sensed image datasets. Due to the special properties of small UAVs (autonomous vehicles) during low-altitude, oblique photography, UAV image feature detail is more rich, while its relationship to target is more complex compared to the traditional remote sensing equipment. Therefore, traditional deep learning models based on local convolution can no longer fulfill this task. In response to the above issues, an improved remote sensing image semantic segmentation network based on SegFormer is proposed. Based on SegFormer, an additional edge contour extraction module(ECEM) is added to the coding layer to assist the model in extracting shallow features of the target. Due to the predominance of buildings in urban remote sensing images, an additional multi-scale atrus spatial pyramid pooling(MSASPP) module is added to the encoding layer to replace the global average pooling with multi-scale strip pooling(MSP) to extract the features of elongated targets in the image. In response to the drawback of the original decoder operation that is unfavorable for feature information restoration, refer to the structure of the U-Net network decoding layer, the features received by the encoding layer are fused before performing upsampling extraction and SE channel attention operations to strengthen feature propagation and fusion. The improved network is tested on the Vaihingen and UAVid remote sensing image semantic segmentation dataset provided by the International Society for Photogrammetry and Remote Sensing(ISPRS). The network achieves 90.30% and 77.90% mean intersection over union(MIoU), respectively, with higher segmentation accuracy than general segmentation networks such as DeepLabV3+ and Swin-Unet.

Key words: high-resolution remote sensing imagery, semantic segmentation, multi-scale strip pooling, edge contour extraction module