计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (23): 219-227.DOI: 10.3778/j.issn.1002-8331.2207-0005

• 图形图像处理 • 上一篇    下一篇

基于Transformer的小样本细粒度图像分类方法

陆妍,王阳萍,王文润   

  1. 1.兰州交通大学 电子与信息工程学院,兰州 730070
    2.轨道交通信息与控制国家级虚拟仿真实验教学中心,兰州 730070
  • 出版日期:2023-12-01 发布日期:2023-12-01

Transformer-Based Few-Shot and Fine-Grained Image Classification Method

LU Yan, WANG Yangping, WANG Wenrun   

  1. 1.School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
    2.National Virtual Simulation Experimental Teaching Center for Rail Transit Information and Control, Lanzhou 730070, China
  • Online:2023-12-01 Published:2023-12-01

摘要: 针对小样本细粒度图像分类任务中存在的相似性度量单一以及细粒度特征提取效果不佳的问题,提出了一种基于Transformer的小样本细粒度图像分类方法,克服了小样本学习在细粒度图像分类中由于样本数量较少从而分类效果较差的问题。构建以多轴注意力模块与卷积算子为基本组件的新模块CBG Transformer Block,通过该模块的重复堆叠提高了网络的特征提取能力;采用关系网络和余弦网络组成的双相似度模块进行相似性度量,避免了在训练数据量较小的情况下单一度量造成的相似性偏差;通过计算两个相似度得分的平均值得出最终预测结果。实验结果表明,提出的方法在CUB-200-2011、Stanford Cars和Stanford Dogs三个公开细粒度图像数据集上的5-way 5-shot任务分类精度分别达到了82.70%、74.22%和69.68%,可见在小样本细粒度图像分类任务中取得了优异效果。

关键词: 细粒度图像分类, 小样本学习, 多轴注意力, CBG Transformer Block, 双相似度

Abstract: To address the problems of single similarity measure and poor fine-grained feature extraction in few-shot and fine-grained image classification tasks, a Transformer-based few-shot and fine-grained image classification method is proposed in this paper to overcome the problem of few-shot learning in fine-grained image classification due to the small number of samples and thus poor classification results. Firstly, it constructs a new module CBG Transformer Block with multi-axis attention module and convolution operator as the basic components, and improves the feature extraction ability of the network by repeated stacking of the module. Secondly, it adopts a dual similarity module consisting of relational network and cosine network for similarity measurement, which avoids the similarity bias caused by a single measure in the case of small training data. Finally, the final prediction results are obtained by calculating the average of the two similarity scores.The experimental results show that the proposed method respectively achieves 82.70%, 74.22% and 69.68% classification accuracy for the 5-way 5-shot task on three publicly available fine-grained image datasets, CUB-200-2011, Stanford Cars and Stanford Dogs. It can be seen that the proposed method has achieved excellent results in few-shot and fine-grained image classification tasks.

Key words: fine-grained image classification, few-shot learning, multi-axis attention, conv-block-grid(CBG) Transformer Block, dual similarity