[1] BEHERA A, WHARTON Z, HEWAGE P, et al. Context-aware attentional pooling (CAP) for fine-grained visual classification[J]. arXiv:2101.06635, 2021.
[2] ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for fine-grained category detection[C]//Proceedings of the Conference on Computer Vision, 2014: 834-849.
[3] BRANSON S, VAN HORN G, BELONGIE S, et al. Bird species categorization using pose normalized deep convolutional nets[J]. arXiv:1406.2952, 2014.
[4] SHIH K J, MALLYA A, SINGH S, et al. Part localization using multi-proposal consensus for fine-grained categorization[J]. arXiv:1507.06332, 2015.
[5] HUANG S, XU Z, TAO D, et al. Part-stacked CNN for fine-grained visual categorization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 1173-1182.
[6] YU C, ZHAO X, ZHENG Q, et al. Hierarchical bilinear pooling for fine grained visual recognition[C]//Proceedings of the European Conference on Computer Vision, 2018: 574-589.
[7] ZHENG H, FU J, TAO M, et al. Learning multi attention convolutional neural network for fine grained image recognition[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 5209-5217.
[8] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[9] HE J, CHEN J, LIU S, et al. TransFG: a transformer architecture for fine-grained recognition[C]//Proceedings of the 36th AAAI Conference on Artificial Intelligence, 2022: 1174-1182.
[10] HU Y Q, JIN X, ZHANG Y, et al. RAMS-Trans: recurrent attention multi-scale transformer for fine-grained image recognition[C]//Proceedings of the 29th ACM International Conference on Multimedia, 2021: 4239-4248.
[11] ZHUANG P, WANG Y, QIAO Y. Learning attentive pairwise interaction for fine-grained classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 13130-13137.
[12] CHEN X N, HSIEH C J, GONG B Q. When vision transformers outperform ResNets without pretraining or strong data augmentations[J]. arXiv:2106.01548, 2021.
[13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. arXiv:1706.03762, 2017.
[14] WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 548-558.
[15] TOLSTIKHIN I, HOULSBY N, KOLESNIKOV A, et al. MLP-Mixer: an all-MLP architecture for vision[J]. arXiv:2105.
01601, 2021.
[16] HENDRYCKS D, GIMPEL K. Gaussian error linear units (GELUs)[J]. arXiv:1606.08415, 2016.
[17] LEI B J, KIROS J R, HINTON G E. Layer normalization[J]. arXiv:1607.06450, 2016.
[18] WELINDER P, BRANSON S, MITA T, et al. Caltech-UCSD birds 200-2011(CUB-200-2011)[EB/OL]. [2023-07-05]. https://www.vision.caltech.edu/datasets/cub_200_2011/.
[19] KHOSLA A, JAYADEVAPRAKASH N, YAO B, et al. Novel dataset for fine-grained image categorization: Stanford dog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012.
[20] KRAUSE J, STARK M, DENG J, et al. 3D object representations for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops, 2013: 554-561.
[21] HE K, ZHANG X, REN S. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[22] LUO W, YANG X, MO X, et al. Cross-x learning for fine-grained visual categorization[C]//Proceedings of the International Conference on Computer Vision, 2019: 8241-8250.
[23] YET O Z, RASSEM T H, RAHMAN M A, et al. Improved attentive pairwise interaction (API-NET) for fine-grained image classification[C]//Proceedings of the Emerging Technology in Computing, Communication and Electronics, 2021: 1-6.
[24] DU R, CHANG D, BHUNIA A K. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches[C]//Proceedings of the European Conference on Computer Vision, 2020: 153-168.
[25] LIU Z. Swin Transformer: hierarchical vision transformer using shifted windows[J]. arXiv:2103.14030, 2021.
[26] WANG J, YU X, GAO Y. Feature fusion vision transformer for fine-grained visual categorization[J]. arXiv:2107.02341, 2021.
[27] SUN H, HE X, PENG Y. SIM-Trans: structure information modeling transformer for fine-grained visual categorization[J]. arXiv:2208.14607, 2022.
[28] YE S, YU S, HOU W, WANG Y, et al. Coping with change: learning invariant and minimum sufficient representations for fine-grained visual categorization[J]. arXiv:2306.04893, 2023. |