计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (8): 9-16.DOI: 10.3778/j.issn.1002-8331.1812-0047

• 热点与综述 • 上一篇    下一篇

基于注意力卷积模块的深度神经网络图像识别

袁嘉杰,张  灵,陈云华   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2019-04-15 发布日期:2019-04-15

Deep Neural Network Based on Attention Convolution Module for Image Recognition

YUAN Jiajie, ZHANG Ling, CHEN Yunhua   

  1. College of Computer Science, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2019-04-15 Published:2019-04-15

摘要: 对于在深度神经网络的中间层分支进行深度融合,产生潜在可以共享有用信息的基础网络,从而优化信息流动,提升深度神经网络的性能,是近期的深度神经网络研究的挑战。对此提出一种基于注意力卷积模块的深度神经网络的图像识别方法。改进的模块主要分为树干分支与软分支两部分,在树干分支上,由两组残差模块组成,使该模块适用于其他深度神经网络;在软分支上,将给定的中间特征图沿着两个维度(空间与通道)获取注意力特征图,对输入中间特征图进行调整,强化有用信息抑制无用信息。改进的卷积残差模块既能解决输入与输出的尺寸不一致的问题,也能强化图像的关键信息与有效促进网络的信息流动。通过对cifar-10、cifar-100、ck+、AVEC2017数据集进行实验,实验结果表明了提出的方法应用于ResNet-50网络上对比Hu提出的方法在训练耗时相差不到0.3%的情况下,识别图像准确率有0.9%~1.2%的提高。

关键词: 图像识别, 残差模块, 注意力, 深度神经网络

Abstract: For deep fusion in the middle layer branches of deep neural networks, it is a challenge for recent deep neural network research to generate a basic network that can share useful information, thereby optimizing information flow and improving the performance of deep neural networks. In this paper, the deep neural network based on attention convolution module is proposed. The proposed module is mainly divided into two parts:the trunk branch and the soft branch. On the trunk branch, it consists of two sets of residual modules, making the module suitable for other deep neural networks. On the soft branch, the given intermediate feature map is taken along two dimensions (space and channel) to obtain the attention feature map, and the input intermediate feature map is adjusted to strengthen useful information to suppress useless information. The proposed convolution residual module can solve the problem of inconsistent input and output size, strengthen the key information of the image and effectively promote the information flow of the network. Experiments on the cifar-10, cifar-100, ck+, AVEC2017 data sets show that the proposed method applied to the resnet-50 network has a higher recognition accuracy(0.9%~1.2%) than the method proposed by Hu when the training time difference is less than 0.3%.

Key words: image identification, residual module, attention, deep neural network