计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (4): 142-152.DOI: 10.3778/j.issn.1002-8331.2209-0156

• 模式识别与人工智能 • 上一篇    下一篇

高效跨域的Transformer小样本语义分割网络

方红,李德生,蒋广杰   

  1. 1. 上海第二工业大学  数理与统计学院,上海  201209
    2. 上海第二工业大学  资源与环境工程学院,上海  201209
  • 出版日期:2024-02-15 发布日期:2024-02-15

Efficient Cross-Domain Transformer Few-Shot Semantic Segmentation Network

FANG Hong, LI Desheng, JIANG Guangjie   

  1. 1. School of Mathematics, Physics and Statistics, Shanghai Polytechnic University, Shanghai 201209, China
    2. School of Resources and Environmental Engineering, Shanghai Polytechnic University, Shanghai 201209, China
  • Online:2024-02-15 Published:2024-02-15

摘要: 小样本语义分割旨在仅使用数个标注样本学习目标类别特征并完成分割任务。主流研究存在的主要问题是:训练效率低下,训练和测试在同一数据域。为此构建了一种基于Transformer的高效、跨域的小样本语义分割网络SGFNet。在编码层,使用共享权重的MixVisionTransformer构建孪生网络,用于提取支持集和查询集的图像特征;在关系计算层,通过计算支持集图像特征向量与其对应mask的哈达玛积,提取目标类别的高维特征,并与查询集图像特征进行关系计算;在解码层,改进基于MLP的解码器,提出了残差解码器,将不同层级的特征解码得到最终分割结果。实验表明,该模型只需要在FSS-1000数据集上使用单张3090 GPU训练1.5~4.0 h,即可在FSS-1000数据集上获得最优结果1-shot mIoU 87.0%,在PASCAL-5i和COCO-20i数据集进行跨域测试达到非跨域的效果,1-shot mIoU分别为60.4%和33.0%,证明了该模型高效且跨域。

关键词: 小样本语义分割(FSS), 跨域, Transformer, 小样本学习(FSL), 语义分割

Abstract: Few-shot semantic segmentation aims at only using several labeling samples to learn target features and complete the semantic segmentation task. The main problems in mainstream research are low training efficiency, meta training and meta testing in the same data domain. For this task, this paper proposes an efficient, cross-domain few-shot semantic segmentation network based on Transformer: SGFNet. In the encoding layer, use the shared weight MixVisionTransformer to build a siamese network to extract the support set and query set image features. In the relationship calculation layer, calculate the Hadamard product of the support set image feature vector and its corresponding mask to extract the target feature maps, and calculate the relationship between them and the image features of the query set. In the decoder layer, improve the MLP decoder and propose a residual decoder to decode the features of different hierarchies to obtain the final segmentation result. Experiments show that the model only needs to use a single 3090 GPU on the FSS-1000 dataset for training 1.5~4.0 h to get the optimal result 1-shot mIoU 87.0% on PASCAL-5i and the COCO-20i dataset perform cross-domain tests to achieve non-cross-domain effects, the 1-shot mIoU is 60.4% and 33.0%, respectively, proving that the model is efficient and cross-domain.

Key words: few-shot semantic segmentation (FSS), cross-domain, transformer, few-shot learning (FSL), semantic segmentation