计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (12): 243-248.DOI: 10.3778/j.issn.1002-8331.2012-0056

• 图形图像处理 • 上一篇    下一篇

可变尺寸循环注意力模型及应用研究

吕冬健,王春立   

  1. 大连海事大学 信息科学技术学院,辽宁 大连 116026
  • 出版日期:2022-06-15 发布日期:2022-06-15

Variable Size for Recurrent Attention Model and Application Research

LYU Dongjian, WANG Chunli   

  1. College of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning 116026, China
  • Online:2022-06-15 Published:2022-06-15

摘要: 视觉注意力模型被应用于自动定位细粒度图片的局部区域以捕捉图片中有辨识度的特征并进行图片的分类任务,但是模型每次的输入图片尺寸是固定的而辨识度的特征区域大小是不确定的,因此模型不能够准确捕捉图片的全部特征造成分类准确率的下降。提出一种可变尺寸循环注意力模型,与之前的固定输入图片尺寸的循环注意力网络相比,模型通过优化注意力策略和尺寸生成策略,能够自主地学习下次输入图片的位置和尺寸,减少总输入图片面积,从而提高处理速度。实验结果表明,动态调整输入图片尺寸,在保持和视觉注意力模型相同识别准确率的情况下,可以显著减少计算总量,提高速度。

关键词: 细粒度图像分类, 强化学习, 可变尺寸

Abstract: Visual attention model has been applied to image recognition tasks which autolocate discriminative local part of fine-grained image to capture different features, but input image size is fixed and the size of discriminative part is uncertain, so model cannot capture all features of image precisely and the classification accuracy is reduced. This paper proposes a variable size recurrent attention network(VSRAM), different from previous fixed input size, recurrent attention network(RAM), the VSRAM optimizes attention policy and size sampling policy to learn the position and size for next input image by itself, reduces total input image areas and increases processing speed. Experimental results show that, dynamically adjusting the size of input image can achieve the same recognition accuracy as RAM, but efficiently reduce the total input image area and increase speed.

Key words: fine-grained image classification, reinforcement learning, variable size