Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (12): 166-175.DOI: 10.3778/j.issn.1002-8331.2210-0331

• Graphics and Image Processing • Previous Articles     Next Articles

LSTFormer:Lightweight Semantic Segmentation Network Based on Swin Transformer

YANG Cheng, GAO Jianlin, ZHENG Meilin, DING Rong   

  1. College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
  • Online:2023-06-15 Published:2023-06-15

LSTFormer:基于Swin Transformer的轻量化语义分割网络

杨承,高建瓴,郑美琳,丁容   

  1. 贵州大学 大数据与信息工程学院,贵阳 550025

Abstract: Aiming at the general problem of high computational complexity in existing semantic segmentation networks based on Transformer, a lightweight semantic segmentation network based on Swin Transformer is proposed. Firstly, feature maps of multiple scales are obtained by Swin Transformer. Secondly, the full perception module and the improved cascading fusion module are used to fuse the feature maps of different scales across layers, reducing the semantic gap between the feature maps of different levels. Then, a single Swin Transformer block is introduced to optimize the initial segmentation feature mapping and improve the ability of the network to classify different pixels through the moving window autoattention mechanism. Finally, Dice loss function and cross-entropy loss function are added in the training stage to improve the segmentation performance and convergence speed of the network. The experimental results show that the mIoU of LSTFormer on ADE20K and Cityscapes reaches 49.47% and 81.47%. Compared with similar networks such as SETR and Swin-UPerNet, LSTFormer has lower parameters and computation while maintaining the same segmentation accuracy.

Key words: lightweight semantic segmentation, Swin Transformer, cross layer fusion, self attention mechanism, loss fusion

摘要: 针对现有基于Transformer的语义分割网络普遍存在计算复杂度高的问题,提出了一种基于Swin Transformer的轻量化语义分割网络。该网络通过Swin Transformer获取多个尺度的特征图;采用全感知模块和改进的级联融合模块跨层融合不同尺度的特征图,减小不同层级特征图的语义差距;引入单个Swin Transformer block对初分割特征映射进行优化,通过移动窗口自注意力机制提升网络对不同像素点进行分类的能力;训练阶段加入Dice损失函数和交叉熵损失函数,提高网络的分割性能和收敛速度。实验结果表明,LSTFormer在数据集ADE20K和Cityscapes上mIoU分别达到49.47%和81.47%,相较于SETR和Swin-UPerNet等同类网络,LSTFormer在保持相当分割精度的同时具有更低的参数量和计算量。

关键词: 轻量化语义分割, Swin Transformer, 跨层融合, 自注意力机制, 损失函数