Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (15): 198-210.DOI: 10.3778/j.issn.1002-8331.2305-0251

• Graphics and Image Processing • Previous Articles     Next Articles

Semantic Segmentation Method of UAV Image Based on Window Attention Aggregation Swin Transformer

LI Junjie, YI Shi, HE Runhua, LIU Xi   

  1. 1.College of Mechanical and Electrical Engineering, Chengdu University of Technology, Chengdu  610059, China
    2.Key Laboratory of Industrial Internet of Things & Networked Control, Ministry of Education, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Online:2024-08-01 Published:2024-07-30

基于窗口注意力聚合Swin Transformer的无人机影像语义分割方法

李俊杰,易诗,何润华,刘茜   

  1. 1.成都理工大学 机电工程学院,成都 610059
    2.重庆邮电大学 工业物联网与网络化控制教育部重点实验室,重庆 400065

Abstract: In the process of ground object classification using UAV remote sensing image, the existing classical semantic segmentation method is difficult to obtain ideal ground object classification effect due to the small size ground object target of UAV image and complex UAV image background, indistinguishable ground object texture information. In this study, based on Swin Transformer, the window attention aggregation Swin Transformer (WAA SwinT) semantic segmentation method is presented. Aiming at those problems, the idea of multi-window attention aggregation is used for more accurate attention calculation to deal with the classification of small size ground objects in UAV remote sensing images. Based on the idea of reference nested connection, the multilevel feature nested connection decoder is applied to UAV remote sensing image segmentation, and the segmentation effect is fine. In order to verify the effectiveness of the proposed semantic segmentation method in UAV image semantic segmentation, experiments are carried out on urban UAV remote sensing image UAVid dataset and UDD dataset, and compared with the current classical semantic methods. The experimental results show that the segmentation method can get the best semantic segmentation effect on both UAVid dataset and UDD dataset. At the same time, the semantic segmentation method can effectively overcome the problem that it is difficult to accurately segment the small size objects and texture of UAV images.

Key words: UAV image, semantic segmentation, Swin Transformer, window attention aggregation

摘要: 采用无人机遥感影像进行地物分类的过程中,由于无人机影像的小尺寸地物目标不够突出和无人机影像背景复杂、地物信息难以辨别等问题,采用现行的经典语义分割方法难以获得理想的地物分类效果。该研究以Swin Transformer网络模型为基础,提出了基于窗口注意力聚合Swin Transformer(window attention aggregation Swin Transformer,WAA SwinT)的语义分割网络模型方法。采用了多窗口注意力聚合的方式来进行更精准的注意力计算,以提升无人机遥感影像中的小尺寸地物目标的分类精度和质量。同时借鉴嵌入连接的思想,采用多级特征嵌入连接解码器改善网络结构,应用于无人机遥感影像的分割中,取得了更精细化的分割效果。为了验证提出的方法在无人机影像语义分割中的效果,分别在城市无人机遥感影像UAVid数据集和UDD数据集进行了实验,并与现行的经典语义分割方法进行了对比。实验结果表明,语义分割方法在UAVid数据集和UDD数据集上均可以得到最佳的语义分割效果。同时,该语义分割方法能显著地提升无人机影像中小尺寸地物精准分割的质量。

关键词: 无人机影像, 语义分割, Swin Transformer, 窗口注意力聚合