计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (24): 173-179.DOI: 10.3778/j.issn.1002-8331.2106-0094

• 模式识别与人工智能 • 上一篇    下一篇

频域混合注意力模型

王芋人,武德安,朱莉   

  1. 电子科技大学 数学科学学院,成都 610097
  • 出版日期:2022-12-15 发布日期:2022-12-15

Frequency Mixture Attention Module

WANG Yuren, WU De’an, ZHU Li   

  1. School of Mathematical Sciences, University of Electronic Science and Technology of China , Chengdu 610097, China
  • Online:2022-12-15 Published:2022-12-15

摘要: 目标检测中的注意力模型以SENet、CBAM为代表,为提高小目标检测的准确率提供了新思路,这些模型在设计时简化了一个问题,即使用全局平均池化或者最大池化作为预处理的方法。FcaNet提出了用离散余弦变化替代均值池化对通道注意力进行预处理的方法,增加了特征多样性,但缺少对特征图空间域方向预处理探讨。为了改善这个问题,提出了从通道和空间域两个方面进行频域预处理的方法。理论上证明了全局平均池化是频域预处理的一种特殊情况,随后对特征图从通道和空间两个方向进行了频域预处理,提出了频域混合注意力模型。该方法在小物体目标检测数据集的实验结果表明:在相近计算量下,平均预测准确率相对SENet、CBAM、FcaNet分别提高了2、1.8、1.4个百分点。

关键词: 目标检测, 混合注意力模型, 频域, 小目标

Abstract: The attention models in target detection represented by SENet and CBAM provide a new idea to improve the accuracy of small target detection. These models simplify a problem in design by using global average pool or maximum pool as a preprocessing method. A method of preprocessing channel attention with discrete cosine change instead of mean pooling is proposed by FcaNet, which increases the diversity of features, but lacks the discussion of maximum pooling and spatial domain direction preprocessing of feature images. In order to improve this problem, a method of frequency domain preprocessing from two aspects of channel and spatial domain is proposed. First of all, it is proved theoretically that global average pooling is a special case of frequency domain preprocessing, and then frequency domain preprocessing is carried out on the feature map from both channel and space directions, and a frequency mixed attention model is proposed. Experimental results of the proposed method on small object detection data sets show that under the similar amount of computation, the average prediction accuracy is improved by 2, 1.8 and 1.4 percentage points in comparison with SENet, CBAM and FcaNet, respectively.

Key words: target detection, mixed attention model, frequency domain, small target