计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (19): 53-63.DOI: 10.3778/j.issn.1002-8331.2201-0374
马瑶,智敏,殷雁君,萍萍
出版日期:
2022-10-01
发布日期:
2022-10-01
MA Yao, ZHI Min, YIN Yanjun, PING Ping
Online:
2022-10-01
Published:
2022-10-01
摘要: 细粒度图像识别旨在从类别图像中辨别子类别。由于图像间只有细微差异,这使得识别任务具有挑战性。随着深度学习技术的不断进步,基于深度学习的方法定位局部和表示特征的能力越来越强,其中以卷积神经网络(CNN)和Transformer为基础的各类算法大大提高了细粒度图像识别精度,细粒度图像领域得到了显著发展。为了整理两类方法在细粒度图像识别领域的发展历程,对该领域近年来只运用类别标签的方法进行了综述。介绍了细粒度图像识别的概念,详细阐述了主流细粒度图像数据集;介绍了基于CNN和Transformer的细粒度图像识别方法及其性能;最后,总结了细粒度图像识别未来的研究方向。
马瑶, 智敏, 殷雁君, 萍萍. CNN和Transformer在细粒度图像识别中的应用综述[J]. 计算机工程与应用, 2022, 58(19): 53-63.
MA Yao, ZHI Min, YIN Yanjun, PING Ping. Review of Applications of CNN and Transformer in Fine-Grained Image Recognition[J]. Computer Engineering and Applications, 2022, 58(19): 53-63.
[1] WEI X S,CUI Q,YANG L,et al.RPC:A large-scale retail product checkout dataset[J].arXiv:1901.07249,2019. [2] WEI Y,TRAN S,XU S,et al.Deep learning for retail product recognition:Challenges and techniques[J].Computational Intelligence and Neuroscience,2020(11):1-23. [3] VAN HORN G,BRANSON S,FARRELL R,et al.Building a bird recognition app and large scale dataset with citizen scientists:The fine print in fine-grained dataset collection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015:595-604. [4] LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521:436-444. [5] WEI X S,SONG Y Z,MAC AODHA O,et al.Fine-grained image analysis with deep learning:A survey[J].arXiv:2111.06119,2021. [6] 李祥霞,吉晓慧,李彬.细粒度图像分类的深度学习方法[J].计算机科学与探索,2021,15(10):1830-1842. LI X X,JI X H,LI B.Deep learning method for fine-grained image categorization[J].Journal of Frontiers of Computer Science and Technology,2021,15(10):1830-1842. [7] WANG Y,MORARIU V I,DAVIS L S.Learning a discriminative filter bank within acnn for fine-grained recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:4148-4157. [8] GE W,LIN X,YU Y.Weakly supervised complementary parts models for fine-grained image classification from the bottom up[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:3034-3043. [9] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16×16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [10] WAH C,BRANSON S,WELINDER P,et al.The Caltech-UCSD Birds-200-2011 dataset[D].California Institute of Technology,2011. [11] BERG T,LIU J,WOO LEE S,et al.Birdsnap:Large-scale fine-grained visual categorization of birds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2014:2011-2018. [12] KHOSLA A,JAYADEVAPRAKASH N,YAO B,et al.Novel dataset for fine-grained image categorization:Stanford dogs[C]//Proceedings of the CVPR Workshop on Fine-Grained Visual Categorization(FGVC),2011. [13] KRAUSE J,STARK M,DENG J,et al.3D object representations for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops,2013:554-561. [14] MAJI S,RAHTU E,KANNALA J,et al.Fine-grained visual classification of aircraft[J].arXiv:1306.5151,2013. [15] WEI X S,XIE C W,WU J,et al.Mask-CNN:Localizing parts and selecting descriptors for fine-grained bird species categorization[J].Pattern Recognition,2018,76:704-714. [16] XIE G S,ZHANG X Y,YANG W,et al.LG-CNN:From local parts to global discrimination for fine-grained recognition[J].Pattern Recognition,2017,71:118-131. [17] WANG Q,LI P,ZHANG L.G2DeNet:Global Gaussian distribution embedding network and its application to visual recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:2730-2739. [18] DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2009:248-255. [19] NILSBACK M E,ZISSERMAN A.Automated flower classification over a large number of classes[C]//Proceedings of the Sixth Indian Conference on Computer Vision,Graphics & Image Processing,2008:722-729. [20] BOSSARD L,GUILLAUMIN M,VAN GOOL L.Food-101-mining discriminative components with random forests[C]//Proceedings of the European Conference on Computer Vision,2014:446-461. [21] YANG L,LUO P,CHANGE LOY C,et al.A large-scale car dataset for fine-grained categorization and verification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015:3973-3981. [22] ZHOU F,LIN Y.Fine-grained image classification by exploring bipartite-graph labels[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:1124-1133. [23] LIU Z,LUO P,QIU S,et al.Deep fashion:Powering robust clothes recognition and retrieval with rich annotations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:1096-1104. [24] HOU S,FENG Y,WANG Z.Vegfru:A domain-specific dataset for fine-grained visual categorization[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:541-549. [25] VAN HORN G,MAC AODHA O,SONG Y,et al.The iNaturalist species classification and detection dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:8769-8778. [26] SUN M,YUAN Y,ZHOU F,et al.Multi-attention multi-class constraint for fine-grained image recognition[C]//Proceedings of the European Conference on Computer Vision(ECCV),2018:805-821. [27] MWEBAZE E,GEBRU T,FROME A,et al.iCassava 2019 fine-grained visual categorization challenge[J].arXiv:1908.02900,2019. [28] MIN W,LIU L,LUO Z,et al.Ingredient-guided cascaded multi-attention network for food recognition[C]//Proceedings of the 27th ACM International Conference on Multimedia,2019:1331-1339. [29] MIN W,LIU L,WANG Z,et al.ISIA food-500:A dataset for large-scale food recognition via stacked global-local attention network[C]//Proceedings of the 28th ACM International Conference on Multimedia,2020:393-401. [30] BAI Y,CHEN Y,YU W,et al.Products-10k:A large-scale product recognition dataset[J].arXiv:2008.10545,2020. [31] VAN HORN G,COLE E,BEERY S,et al.Benchmarking representation learning for natural world image collections[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:12884-12893. [32] SIMONELLI A,DE NATALE F,MESSELODI S,et al.Increasingly specialized ensemble of convolutional neural networks for fine-grained recognition[C]//Proceedings of the 25th IEEE International Conference on Image Processing,2018:594-598. [33] CHEN Y.Convolutional neural network for sentence classification[D].University of Waterloo,2015. [34] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409. 1556,2014. [35] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:770-778. [36] LAM M,MAHASSENI B,TODOROVIC S.Fine-grained recognition as HSnet search for informative image parts[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:2520-2529. [37] DING Y,ZHOU Y,ZHU Y,et al.Selective sparse sampling for fine-grained image recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:6599-6608. [38] SUN G,CHOLAKKAL H,KHAN S,et al.Fine-grained recognition:Accounting for subtle differences between similar classes[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020:12047-12054. [39] FU J,ZHENG H,MEI T.Look closer to see better:Recurrent attention convolutional neural network for fine-grained image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:4438-4446. [40] ZHENG H,FU J,MEI T,et al.Learning multi-attention convolutional neural network for fine-grained image recognition[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:5209-5217. [41] ZHENG H,FU J,ZHA Z J,et al.Learning rich part hierarchies with progressive attention networks for fine-grained image recognition[J].IEEE Transactions on Image Processing,2019,29:476-488. [42] ZHENG H,FU J,ZHA Z J,et al.Looking for the devil in the details:Learning trilinear attention sampling network for fine-grained image recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:5012-5021. [43] JI R,WEN L,ZHANG L,et al.Attention convolutional binary neural tree for fine-grained visual categorization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:10468-10477. [44] WEI H,ZHU M,WANG B,et al.Two-level progressive attention convolutional network for fine-grained image recognition[J].IEEE Access,2020,8:104985-104995. [45] YANG Z,LUO T,WANG D,et al.Learning to navigate for fine-grained classification[C]//Proceedings of the European Conference on Computer Vision(ECCV),2018:420-435. [46] YAN T,WANG S,WANG Z,et al.Progressive learning for weakly supervised fine-grained classification[J].Signal Processing,2020,171:107519. [47] LIU C,XIE H,ZHA Z J,et al.Filtration and distillation:Enhancing region attention for fine-grained visual categorization[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020:11555-11562. [48] WANG Z,WANG S,YANG S,et al.Weakly super vised fine-grained image classification via Guassian mixture model oriented discriminative learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:9749-9758. [49] WANG Z,WANG S,LI H,et al.Graph-propagation based correlation learning for weakly supervised fine-grained image classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020:12289-12296. [50] ZHENG H,FU J,ZHA Z J,et al.Learning deep bilinear transformation for fine-grained image representation[J].arXiv:1911.03621,2019. [51] LI X,YANG C,CHEN S L,et al.Semantic bilinear pooling for fine-grained recognition[C]//Proceedings of the 25th International Conference on Pattern Recognition,2021:3660-3666. [52] YU C,ZHAO X,ZHENG Q,et al.Hierarchical bilinear pooling for fine-grained visual recognition[C]//Proceedings of the European Conference on Computer Vision,2018:574-589. [53] LUO W,YANG X,MO X,et al.Cross-X learning for fine-grained visual categorization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:8242-8251. [54] CUI Y,ZHOU F,WANG J,et al.Kernel pooling for convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:2921-2930. [55] CAI S,ZUO W,ZHANG L.Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:511-520. [56] CHANG D,DING Y,XIE J,et al.The devil is in the channels:Mutual-channel loss for fine-grained image classification[J].IEEE Transactions on Image Processing,2020,29:4683-4695. [57] ZHANG L,HUANG S,LIU W,et al.Learning a mixture of granularity-specific experts for fine-grained categorization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:8331-8340. [58] CHEN Y,BAI Y,ZHANG W,et al.Destruction and construction learning for fine-grained image recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:5157-5166. [59] RAO Y,CHEN G,LU J,et al.Counterfactual attention learning for fine-grained visual categorization and re-identification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2021:1025-1034. [60] PENG Y,HE X,ZHAO J.Object-part attention model for fine-grained image classification[J].IEEE Transactions on Image Processing,2017,27(3):1487-1500. [61] ZHAO Y,YAN K,HUANG F,et al.Graph-based high-order relation discovery for fine-grained recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:15079-15088. [62] ZHUANG P,WANG Y,QIAO Y.Learning attentive pairwise interaction for fine-grained classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020:13130-13137. [63] GAO Y,HAN X,WANG X,et al.Channel interaction networks for fine-grained image categorization[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020:10818-10825. [64] ZHANG T,CHANG D,MA Z,et al.Progressive co-attention network for fine-grained visual classification[J].arXiv:2101.08527,2021. [65] XU J,WEI Y,DENG W.Feature correlation residual network for fine-grained image recognition[J].IEEE Access,2020,8:214322-214331. [66] CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision,2020:213-229. [67] ZHENG S,LU J,ZHAO H,et al.Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:6881-6890. [68] HE J,CHEN J N,LIU S,et al.TransFG:A transformer architecture for fine-grained recognition[J].arXiv:2103. 07976,2021. [69] ZHANG Y,CAO J,ZHANG L,et al.A free lunch from ViT:Adaptive attention multi-scale fusion transformer for fine-grained visual recognition[J].arXiv:2110.01240,2021. [70] LIU X,WANG L,HAN X.Transformer with peak suppression and knowledge guidance for fine-grained image recognition[J].arXiv:2107.06538,2021. [71] WANG J,YU X,GAO Y.Feature fusion vision transformer for fine-grained visual categorization[J].arXiv:2107.02341,2021. [72] CONDE M V,TURGUTLU K.Exploring vision transformers for fine-grained classification[J].arXiv:2106. 10587,2021. |
[1] | 高广尚. 深度学习推荐模型中的注意力机制研究综述[J]. 计算机工程与应用, 2022, 58(9): 9-18. |
[2] | 吉梦, 何清龙. AdaSVRG:自适应学习率加速SVRG[J]. 计算机工程与应用, 2022, 58(9): 83-90. |
[3] | 罗向龙, 郭凰, 廖聪, 韩静, 王立新. 时空相关的短时交通流宽度学习预测模型[J]. 计算机工程与应用, 2022, 58(9): 181-186. |
[4] | 胡章芳, 蹇芳, 唐珊珊, 明子平, 姜博文. DFSMN-T:结合强语言模型Transformer的中文语音识别[J]. 计算机工程与应用, 2022, 58(9): 187-194. |
[5] | 阿里木·赛买提, 斯拉吉艾合麦提·如则麦麦提, 麦合甫热提, 艾山·吾买尔, 吾守尔·斯拉木, 吐尔根·依不拉音. 神经机器翻译面对句长敏感问题的研究[J]. 计算机工程与应用, 2022, 58(9): 195-200. |
[6] | 陈一潇, 阿里甫·库尔班, 林文龙, 袁旭. 面向拥挤行人检测的CA-YOLOv5[J]. 计算机工程与应用, 2022, 58(9): 238-245. |
[7] | 方义秋, 卢壮, 葛君伟. 联合RMSE损失LSTM-CNN模型的股价预测[J]. 计算机工程与应用, 2022, 58(9): 294-302. |
[8] | 张鑫, 姚庆安, 赵健, 金镇君, 冯云丛. 全卷积神经网络图像语义分割方法综述[J]. 计算机工程与应用, 2022, 58(8): 45-57. |
[9] | 石颉, 袁晨翔, 丁飞, 孔维相. SAR图像建筑物目标检测研究综述[J]. 计算机工程与应用, 2022, 58(8): 58-66. |
[10] | 孙刘杰, 赵进, 王文举, 张煜森. 多尺度Transformer激光雷达点云3D物体检测[J]. 计算机工程与应用, 2022, 58(8): 136-146. |
[11] | 熊风光, 张鑫, 韩燮, 况立群, 刘欢乐, 贾炅昊. 改进的遥感图像语义分割研究[J]. 计算机工程与应用, 2022, 58(8): 185-190. |
[12] | 杨锦帆, 王晓强, 林浩, 李雷孝, 杨艳艳, 李科岑, 高静. 深度学习中的单阶段车辆检测算法综述[J]. 计算机工程与应用, 2022, 58(7): 55-67. |
[13] | 王斌, 李昕. 融合动态残差的多源域自适应算法研究[J]. 计算机工程与应用, 2022, 58(7): 162-166. |
[14] | 谭暑秋, 汤国放, 涂媛雅, 张建勋, 葛盼杰. 教室监控下学生异常行为检测系统[J]. 计算机工程与应用, 2022, 58(7): 176-184. |
[15] | 朱学超, 张飞, 高鹭, 任晓颖, 郝斌. 基于残差网络和门控卷积网络的语音识别研究[J]. 计算机工程与应用, 2022, 58(7): 185-191. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||