Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (11): 194-203.DOI: 10.3778/j.issn.1002-8331.2310-0302

• Graphics and Image Processing • Previous Articles     Next Articles

Semantic Segmentation Method for Remote Sensing Images Based on Improved Swin Transformer

WANG Yizhong, HU Yaqi, WU Xiaosuo, YAN Haowen, WANG Xiaocheng   

  1. 1.School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
    2.School of Mapping and Geoinformation, Lanzhou Jiaotong University, Lanzhou 730070, China
  • Online:2024-06-01 Published:2024-05-31

基于改进Swin Transformer的遥感图像语义分割方法

王一中,胡亚琦,吴小所,闫浩文,王小成   

  1. 1.兰州交通大学 电子与信息工程学院,兰州 730070
    2.兰州交通大学 测绘与地理信息学院,兰州 730070

Abstract: Extracting accurate feature information in high-resolution remote sensing images plays an important role in urban planning as well as land resource utilization. However, remote sensing images are characterized by large scale differences between target objects and complex backgrounds, which easily lead to inaccurate extraction results, especially the low extraction accuracy for small-scale features. In order to solve these problems, this paper proposes a novel dual-coding structure to fully acquire global semantic information as well as spatial detail information, and to fuse feature information at different scales in stages to enhance the feature representation capability. The feature enhancement module (FEM) is constructed to reduce the loss of detail information in downsampling and focus on more small-scale features. In order to better refine the feature information, channel attention and kernel attention are fused and then up-sampled, which is able to integrate the local features with the corresponding global spatial dependencies and enhance the segmentation accuracy of the target object. The mIoU on Potsdam dataset and Vaihingen dataset are 86.1% and 82.4%, respectively. Comparative analysis with popular semantic segmentation models shows that the method in this paper can effectively solve the problem of inaccurate segmentation of small- and medium-scale objects in remote sensing images, and it is suitable for dealing with the task of semantic segmentation of remote sensing images.

Key words: semantic segmentation, dual coding structure, feature enhancement, fusion attention mechanism, small scale features

摘要: 在高分辨率的遥感图像中提取出准确的地物信息对城市规划以及土地资源利用有重要作用。然而,遥感图像具有目标物体之间尺度差异大,背景复杂等特点,易导致提取结果不准确,特别是对小尺度地物的提取精度较低。为了解决这些问题,提出一种新型双编码结构,充分获取全局语义信息以及空间细节信息,分阶段融合不同尺度的特征信息,增强特征表示能力。构造了特征加强模块(FEM),以减少下采样中细节信息的丢失,关注更多小尺度特征。为了更好地细化特征信息,融合了通道注意力和内核注意力后进行上采样,能够将局部特征与对应的全局空间依赖关系整合,提升目标物体的分割精度。在Potsdam数据集和Vaihingen数据集上的mIoU分别为86.1%和82.4%,与流行的语义分割模型进行对比分析,结果表明,该方法能够有效解决遥感图像中小尺度物体分割不准确的问题,适合处理遥感图像语义分割任务。

关键词: 语义分割, 双编码结构, 特征加强, 融合注意力机制, 小尺度地物