Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (22): 251-260.DOI: 10.3778/j.issn.1002-8331.2307-0250

• Graphics and Image Processing • Previous Articles     Next Articles

Improved MobileViT Algorithm for Small Samples

ZHANG Bushi, FAN Hong   

  1. School of Logistics Engineering, Shanghai Maritime University, Shanghai 201306, China
  • Online:2024-11-15 Published:2024-11-14

针对小样本改进的MobileViT算法

张埠石,范红   

  1. 上海海事大学 物流工程学院,上海 201306

Abstract: To improve the classification ability, training speed, convergence, and inference speed of the MobileViT algorithm based on Transformer on small-sample data, two modules are proposed and inserted into the MobileViT algorithm:convolutional maxpooling downsampling (CMP) and multi-branch residual feature fusion (MR-FF). These modules are respectively used to reduce model parameters, reduce feature redundancy and prevent input feature loss. Taking the results of the MobileViT with the minimum number of parameters as an example, comparative experiments are conducted on the Oxford Flower102 and Mini-ImageNet small-sample datasets. The MobileViT with the inserted CMP and MR-FF modules achieves a 12.9 and 9.4 percentage points increase in test accuracy, a 17% increase in training speed, and a 0.31?ms increase in inference speed. Furthermore, it is found that when only the CMP module is inserted into MobileViT, higher classification accuracy and shorter inference time can be achieved on small-sample datasets with fewer than 60?000 images. Finally, a comparison is made with 5 advanced image classification algorithms, and the improved MobileViT achieves the best test results on small-sample data.

Key words: maxpooling, small-sample, Transformer, image classification

摘要: 为了提高基于Transformer的MobileViT算法在小样本数据上的分类能力,加快算法的训练、收敛以及提高推理速度,提出了卷积池化下采样(convolutional maxpooling downsampling,CMP)和多分支残差特征融合(multi-branch residual feature fusion,MR-FF)两个模块,并插入到MobileViT算法的模块中,分别用于降低模型的参数量,减少特征冗余和防止输入特征丢失。以最小参数量的MobileViT实验结果为例,在Oxford Flower102、Mini-ImageNet小样本数据集上进行了对比实验,插入以上两个模块的MobileViT在测试准确率上分别提升了12.9、9.4个百分点,在训练速度上提升了17%,在推理速度上提升了0.31?ms。当在MobileViT中只插入CMP模块时,可以在小于60?000张图像的小样本数据集上获得更高的分类准确度、最短的推理时间。最后与五种先进的图像分类算法进行了比较,改进的MobileVIT在小样本分类数据上取得了最优的测试结果。

关键词: 最大池化, 小样本, Transformer, 图像分类