可变尺寸循环注意力模型及应用研究

doi:10.3778/j.issn.1002-8331.2012-0056

摘要/Abstract

摘要： 视觉注意力模型被应用于自动定位细粒度图片的局部区域以捕捉图片中有辨识度的特征并进行图片的分类任务，但是模型每次的输入图片尺寸是固定的而辨识度的特征区域大小是不确定的，因此模型不能够准确捕捉图片的全部特征造成分类准确率的下降。提出一种可变尺寸循环注意力模型，与之前的固定输入图片尺寸的循环注意力网络相比，模型通过优化注意力策略和尺寸生成策略，能够自主地学习下次输入图片的位置和尺寸，减少总输入图片面积，从而提高处理速度。实验结果表明，动态调整输入图片尺寸，在保持和视觉注意力模型相同识别准确率的情况下，可以显著减少计算总量，提高速度。

关键词: 细粒度图像分类, 强化学习, 可变尺寸

Abstract: Visual attention model has been applied to image recognition tasks which autolocate discriminative local part of fine-grained image to capture different features, but input image size is fixed and the size of discriminative part is uncertain, so model cannot capture all features of image precisely and the classification accuracy is reduced. This paper proposes a variable size recurrent attention network（VSRAM）, different from previous fixed input size, recurrent attention network（RAM）, the VSRAM optimizes attention policy and size sampling policy to learn the position and size for next input image by itself, reduces total input image areas and increases processing speed. Experimental results show that, dynamically adjusting the size of input image can achieve the same recognition accuracy as RAM, but efficiently reduce the total input image area and increase speed.

Key words: fine-grained image classification, reinforcement learning, variable size

吕冬健, 王春立. 可变尺寸循环注意力模型及应用研究[J]. 计算机工程与应用, 2022, 58(12): 243-248.

LYU Dongjian, WANG Chunli. Variable Size for Recurrent Attention Model and Application Research[J]. Computer Engineering and Applications, 2022, 58(12): 243-248.

参考文献

[1] BOSSARD L，GUILLAUMIN M，GOOL L V.Food-101-mining discriminative components withrandom forests[C]//Proceedings of European Conference on Computer Vision，2014：446-461.
[2] BERG T，LIU J X，LEE S W，et al.Birdsnap：large-scale fine-grained visual categorization of birds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：2011-2018.
[3] CUI Y，ZHOU F，LIN Y Q，et al.Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：1153-1162.
[4] XIE L，WANG J，ZHANG B，et al.Fine-grained image search[J].IEEE Transactions on Multimedia，2015，17（5）：636-647.
[5] YAMAGUCHI K，KIAPOUR M H，ORTIZ L E，et al.Retrieving similar styles to parse clothing[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2015，37（5）：1028-1040.
[6] BOLA?NOS M，RADEVA P.Simultaneous food localization andrecognition[C]//Proceedings of International Conference on Pattern Recognition，2016.
[7] LIU J，KANAZAWA A，JACOBS D，et al.Dog breed classification using part localization[C]//Proceedings of the IEEE European Conference on Computer Vision，2012：172-185.
[8] ZHU L，SHEN J，JIN H，et al.Landmark classification with hierarchical multi-modal exemplar feature[J].IEEE Transactions on Multimedia，2015，17（7）：981-993.
[9] LIN T Y，ROYCHOWDHURY A，MAJI S.Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1449-1457.
[10] KRAUSE J，SAPP B，HOWARD A，et al.The unreasonable effectiveness of noisy data for fine-grained recognition[C]//Proceedings of European Conference on Computer Vision，2016：301-320.
[11] GAVVES E，FERNANDO B，SNOEK C G M，et al.Fine-grained categorization by alignments[C]//Proceedings of the IEEE International Conference on Computer Vision，2013：1713-1720.
[12] CHAI Y，LEMPITSKY V，ZISSERMAN A.BiCoS：a bi-level co-segmentation method for image classification[C]//Proceedings of the IEEE International Conference on Computer Vision，2011：2579-2586.
[13] BRANSON S，HORN G V，PERONA P，et al.Improved bird species recognition using pose normalized deep convolutional nets[C]//Proceedings of the British Machine Vision Conference，2014：1-14.
[14] XIAOT，XU Y，YANG K，et al.The application of two-level attention models in deep convolutional neural networkforfine-grained image classi?cation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition，2015：842-850.
[15] ZAMIR A R，WU T L，SUN L，et al.Feedback networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition，2017：1808-1817.
[16] ZHAO B，WU X.Diversified visual attention networks for fine-grained object classification[J].IEEE Transactions on Multimedia，2017，19：1245-1256.
[17] 杜秀丽，马振倩，邱少明，等.基于卷积注意力机制的运动想象脑电信号识别[J].计算机工程与应用，2021，57（18）：181-185.
DU X L，MA Z Q，QIU S M，et al.Recognition of motor imaging EEG signals based on convolution attention mechanism[J].Computer Engineering and Applications，2021，57（18）：181-185.
[18] BA J L，MNIH V，KAVUKCUOGLU K.Multiple object recognition with visual attention[J].arXiv：1412.7755，2014.
[19] SERMANET P，FROME A，REAL E.Attention for fine-grained categorization[J].arXiv：1412.7054，2014.
[20] XU K，BA J，KIROS R，et al.Show，attend and tell：neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on Machine Learning，2015：2048-2057.
[21] XU H J，SAENKO K.Ask，attend and answer：exploring question-guided spatial attention for visual question answering[C]//Proceedings of European Conference on Computer Vision，2016：451-466.
[22] 李梅，宁德军，郭佳程.基于注意力机制的CNN-LSTM模型及其应用[J].计算机工程与应用，2019，55（13）：20-27.
LI M，NING D J，GUO J C.Attention mechanism-based CNN-LSTM model and its application[J].Computer Engineering and Applications，2019，55（13）：20-27.
[23] SUTTON R S，MCALLESTER D A，SINGH S P，et al.Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of Neural Information Processing Systems，1999：1057-1063.
[24] ZHANG N，DONAHUE J，GIRSHICK R，et al.Part-based R-CNNs for fine-grained category detection[C]//Proceedings of European Conference on Computer Vision，2014：834-849.
[25] BRANSON S，HORN G V，BELONGIE S，et al.Bird species categorization using pose normalized deep convolutional nets[J].arXiv：1406.2952，2014.
[26] SIMON M，RODNER E.Neural activation constellations：unsupervised part model discovery with convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1143-1151.
[27] JADERBERG M，SIMONYAN K，ZISSERMAN A，et al.Spatial transformer networks[C]//Advances in Neural Information Processing Systems，2015：2017-2025.
[28] KONG S，FOWLKES C.Low-rank bilinear pooling for fine-grained classification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition，2017：7025-7034.
[29] CHAI Y N，LEMPITSKY V，ZISSERMAN A.Symbiotic segmentation and part localization for ?ne-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision，2013：321-328.
[30] GOSSELIN P H，MURRAY N，J′EGOU H，et al.Revisiting the fisher vector for fine-grained classification[J].Pattern Recognition Letters，2014，49：92-98.
[31] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：580-587.
[32] WANG Y M，CHOI J H，MORARIU V，et al.Mining discriminative triplets of patches for fine-grained classification[C]//Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition，2016：1163-1172.
[33] LIU X，XIA T，WANG J，et al.Fully convolutional attention localization networks：efficient attention localization for fine-grained recognition[J].arXiv：1603.06765，2016.