Improved Lightweight Attention Model Based on CBAM

doi:10.3778/j.issn.1002-8331.2101-0369

Abstract

Abstract:

In recent years, the attention model has been widely used in the field of computer vision. By adding the attention module to the convolutional neural network, the performance of the network can be significantly improved. However, most of the existing methods focus on the development of more complex attention modules to enable the convolutional neural network to obtain stronger feature expression capabilities, but this also inevitably increases the complexity of the model. In order to achieve a balance between performance and complexity, a lightweight EAM（Efficient Attention Module） model is proposed to optimize the CBAM model. For the channel attention module of CBAM, one-dimensional convolution is introduced to replace the fully connected layer to aggregate the channels. For the spatial attention module of CBAM, the large convolution kernel is replaced with a dilated convolution to increase the receptive field for aggregation Broader spatial context information. After integrating the model into YOLOv4 and testing it on the VOC2012 data set, mAP is increased by 3.48 percentage points. Experimental results show that the attention model only introduces a small amount of parameters, and the network performance can be greatly improved.

Key words: convolutional neural network, attention mechanism, object detection

摘要：

近几年注意力模型在计算机视觉领域取得了广泛的应用，通过在卷积神经网络中加入注意力模型，网络的性能可以显著提升。然而大多数现有的方法都专注于开发更复杂的注意力模型，以使卷积神经网络获得更强的特征表达能力，但这也不可避免地增加了模型的复杂性。为了在性能和复杂度间取得平衡，对CBAM模型进行优化提出了轻量级的EAM（Efficient Attention Module）模型。针对CBAM的通道注意力模块，引入一维卷积替代全连接层来聚合各通道间的信息；对于CBAM的空间注意力模块，将大卷积核替换为空洞卷积来增加感受野以聚合更广的空间上下文信息。将该模型融入YOLOv4后在VOC2012数据集上进行测试，mAP提高3.48个百分点。实验结果表明，该注意力模型只引入较小的参数量，网络性能可获得较大提升。

关键词: 卷积神经网络, 注意力机制, 目标检测

FU Guodong, HUANG Jin, YANG Tao, ZHENG Siyu. Improved Lightweight Attention Model Based on CBAM[J]. Computer Engineering and Applications, 2021, 57(20): 150-156.

付国栋，黄进，杨涛，郑思宇. 改进CBAM的轻量级注意力模型[J]. 计算机工程与应用, 2021, 57(20): 150-156.

References

[1] KRIZHEVSKY A，SUTSKEVER I，HINTON G.ImageNet classification with deep convolutional neural networks[C]//Proceedings of NIPS，2012.
[2] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition，2016.
[3] SZEGEDY C，LIU W，JIA Y，et al.Going deeper with convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition，2014.
[4] SZEGEDY C，VANHOUCKE V，IOFFE S，et al.Rethinking the inception architecture for computer vision[C]//IEEE Conference on Computer Vision and Pattern Recognition，2016.
[5] SZEGEDY C，IOFFE S，VANHOUCKE V，et al.Inception-v4，inception-ResNet and the impact of residual connections on learning[C]//Proceedings of AAAI’17，2017：4278-4284.
[6] XIE S，GIRSHICK R，DOLLAR P，et al.Aggregated residual transformations for deep neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition，2017.
[7] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision.Cham：Springer，2016.
[8] ZHU M，HAN K，YU C，et al.Dynamic feature pyramid networks for object detection[J].arXiv：2012.00779，2020.
[9] BOCHKOVSKIY A，WANG C Y，Liao H Y M.YOLOv4：optimal speed and accuracy of object detection[J].arXiv：2004.10934，2020.
[10] LONG J，SHELHAMER E，DARRELL T.Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition，2015.
[11] RONNEBERGER O，FISCHER P，BROX T.U-Net：convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham：Springer，2015.
[12] BALAJI V，RAYMOND J W，PRITAM C.DeepSort：deep convolutional networks for sorting haploid maize seeds[J].BMC Bioinformatics，2018，19：85-93.
[13] HU J，SHEN L，SUN G，et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42（8）：2011-2023.
[14] WOO S，PARK J，LEE J Y，et al.CBAM：convolutional block attention module[J].arXiv：1807.06521，2018.
[15] PARK J，WOO S，LEE J Y，et al.A simple and light-weight attention module for convolutional neural networks[J].International Journal of Computer Vision，2020，128（9）：783-798.
[16] WANG X，GIRSHICK R，GUPTA A，et al.Non-local neural networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018.
[17] FU J，LIU J，TIAN H，et al.Dual attention network for scene segmentation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020.
[18] LI X，WANG W，HU X，et al.Selective kernel networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020.
[19] EVERINGHAM M，VAN GOOL L，WILLIAMS C K I，et al.The pascal visual object classes challenge[EB/OL].[2020-12-10].http：//host.robots.ox.ac.uk/pascal/VOC/.
[20] WANG Q，WU B，ZHU P，et al.ECA-Net：efficient channel attention for deep convolutional neural networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020.