频域混合注意力模型

doi:10.3778/j.issn.1002-8331.2106-0094

摘要/Abstract

摘要： 目标检测中的注意力模型以SENet、CBAM为代表，为提高小目标检测的准确率提供了新思路，这些模型在设计时简化了一个问题，即使用全局平均池化或者最大池化作为预处理的方法。FcaNet提出了用离散余弦变化替代均值池化对通道注意力进行预处理的方法，增加了特征多样性，但缺少对特征图空间域方向预处理探讨。为了改善这个问题，提出了从通道和空间域两个方面进行频域预处理的方法。理论上证明了全局平均池化是频域预处理的一种特殊情况，随后对特征图从通道和空间两个方向进行了频域预处理，提出了频域混合注意力模型。该方法在小物体目标检测数据集的实验结果表明：在相近计算量下，平均预测准确率相对SENet、CBAM、FcaNet分别提高了2、1.8、1.4个百分点。

关键词: 目标检测, 混合注意力模型, 频域, 小目标

Abstract: The attention models in target detection represented by SENet and CBAM provide a new idea to improve the accuracy of small target detection. These models simplify a problem in design by using global average pool or maximum pool as a preprocessing method. A method of preprocessing channel attention with discrete cosine change instead of mean pooling is proposed by FcaNet, which increases the diversity of features, but lacks the discussion of maximum pooling and spatial domain direction preprocessing of feature images. In order to improve this problem, a method of frequency domain preprocessing from two aspects of channel and spatial domain is proposed. First of all, it is proved theoretically that global average pooling is a special case of frequency domain preprocessing, and then frequency domain preprocessing is carried out on the feature map from both channel and space directions, and a frequency mixed attention model is proposed. Experimental results of the proposed method on small object detection data sets show that under the similar amount of computation, the average prediction accuracy is improved by 2, 1.8 and 1.4 percentage points in comparison with SENet, CBAM and FcaNet, respectively.

Key words: target detection, mixed attention model, frequency domain, small target

王芋人, 武德安, 朱莉. 频域混合注意力模型[J]. 计算机工程与应用, 2022, 58(24): 173-179.

WANG Yuren, WU De’an, ZHU Li. Frequency Mixture Attention Module[J]. Computer Engineering and Applications, 2022, 58(24): 173-179.

参考文献

[1] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：580-587.
[2] REN S，HE K，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2017，39（6）：1137-1149.
[3] REDMON J，FARHADI A.Yolov3：an incremental improvement[J].arXiv：1804.02767，2018.
[4] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision.Cham：Springer，2016：21-37.
[5] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[6] DAI J，QI H，XIONG Y，et al.Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：764-773.
[7] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[8] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.Imagenet classification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems，2012，25：1097-1105.
[9] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409.
1556，2014.
[10] SZEGEDY C，LIU W，JIA Y，et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：1-9.
[11] 李红光，于若男，丁文锐.基于深度学习的小目标检测研究进展[J].航空学报，2021，42（7）：024691.
LI H G，YU R N，DING W R.Research development of small object traching based on deep learning[J].Acta Aeronautica et Astronautica Sinica，2021，42（7）：024691.
[12] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[13] WANG Q，WU B，ZHU P，et al.ECA-Net：efficient channel attention for deep convolutional neural networks[C]//CVF Conference on Computer Vision and Pattern Recognition，2020.
[14] WOO S，PARK J，LEE J Y，et al.Cbam：convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision，2018：3-19.
[15] GAO Z，XIE J，WANG Q，et al.Global second-order pooling convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：3024-3033.
[16] BELLO I，ZOPH B，VASWANI A，et al.Attention augmented convolutional networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：3286-3295.
[17] LI X，WANG W，HU X，et al.Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：510-519.
[18] QIN Z，ZHANG P，WU F，et al.Fcanet：frequency channel attention networks[J].arXiv：2012.11879，2020.
[19] BRACEWELL R N.Discrete hartley transform[J].Journal of the Optical Society of America，1983，73（12）：1832-1835.
[20] MA Z，YU L，CHAN A B.Small instance detection by integer programming on object density maps[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：3689-3697.