Transformer-Based Few-Shot and Fine-Grained Image Classification Method
LU Yan, WANG Yangping, WANG Wenrun
1.School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2.National Virtual Simulation Experimental Teaching Center for Rail Transit Information and Control, Lanzhou 730070, China
LU Yan, WANG Yangping, WANG Wenrun. Transformer-Based Few-Shot and Fine-Grained Image Classification Method[J]. Computer Engineering and Applications, 2023, 59(23): 219-227.
[1] FINN C,ABBEEL P,LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning,2017:1126-1135.
[2] SUNG F,YANG Y,LI Z,et al.Learning to compare:relation network for few-shot learning[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,2018.
[3] LI W,WANG L,XU J,et al.Revisiting local descriptor based image-to-class measure for few-shot learning[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2019.
[4] 李祥霞,吉晓慧,李彬.细粒度图像分类的深度学习方法[J].计算机科学与探索,2021,15(10):1830-1842.
LI X X,JI X H,LI B.Deep learning method for fine-grained image categorization[J].Journal of Frontiers of Computer Science and Technology,2021,15(10):1830-1842.
[5] WANG Y,YAO Q,Kwok J T,et al.Generalizing from a few examples:a survey on few-shot learning[J].ACM Computing Surveys(CSUR),2020,53(3):1-34.
[6] KOCH G,ZEMEL R,SALAKHUTDINOV R.Siamese neural networks for one-shot image recognition[C]//International Conference on Machine Learning,Lille,France,2015.
[7] VINYALS O,BLUNDELL C,LILLICRAP T,et al.Matching networks for one shot learning[C]//Neural Information Processing Systems,Barcelona,2016:3630-3638.
[8] SNELL J,SWERSKY K,ZEMEL R.Prototypical networks for few-shot learning[C]//Neural Information Processing Systems,Long Beach,2017:4077-4087.
[9] ZHANG C,CAI Y,LIN G,et al.Deepemd:few-shot image classification with differentiable earth mover’s distance and structured classifiers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:12203-12213.
[10] LI A,HUANG W,LAN X,et al.Boosting few-shot learning with adaptive margin loss[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:12576-12584.
[11] LI X X,WU J J,SUN Z,et al.BSNet:bi-similarity network for few-shot fine-grained image classification[J].IEEE Transactions on Image Processing,2021,30:1318-1331.
[12] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017:5998-6008.
[13] WANG X,GIRSHICK R,GUPTA A,et al.Non-local neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition,2018:7794-7803.
[14] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:transformersfor image recognition at scale[J].arXiv:2010.11929,2020.
[15] LIU Z,LIN Y,CAO Y,et al.Swin transformer:hierarchical vision transformer using shifted windows[C]//IEEE/CVF International Conference on Computer Vision,2021:10012-10022.
[16] TU Z Z,TALEBI H,ZHANG H,et al.MaxViT:multi-axis vision transformer[J].arXiv:2204.01697v4,2022.
[17] HO J,KALCHBRENNER N,WEISSENBORN D,et al.Axial attention in multidimensional transformers[J].arXiv:1912.12180,2019.
[18] WANG Q,WU B,ZHU P,et al.ECA-Net:efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2020.
[19] WAH C,BRANSON S,WELINDER P,et al.The caltech-ucsd birds-200-2011 dataset[D].California Institute of Technology,2011.
[20] KRAUSE J,STARK M,DENG J,et al.3D object representations for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vsion Workshops,Sydney,2013:554-561.
[21] KHOSLA A,JAYADEVAPRAKASH N,YAO B,et al.Novel dataset for fine-grained image categorization:stanford dogs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Colorado Springs,USA,2012:3181866.
[22] SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-cam:visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:618-626.