计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (21): 95-101.DOI: 10.3778/j.issn.1002-8331.2010-0085

• 理论与研发 • 上一篇    下一篇

基于HardSoftmax的并行选择核注意力

朱梦,闵卫东,张煜,段静雯   

  1. 1.南昌大学 信息工程学院,南昌 330031
    2.南昌大学 软件学院,南昌 330047
    3.江西省智慧城市重点实验室,南昌 330047
  • 出版日期:2021-11-01 发布日期:2021-11-04

Parallel Selective Kernel Attention Based on HardSoftmax

ZHU Meng, MIN Weidong, ZHANG  Yu, DUAN Jingwen   

  1. 1.School of Information Engineering, Nanchang University, Nanchang 330031, China
    2.School of Software, Nanchang University, Nanchang 330047, China
    3.Jiangxi Key Laboratory of Smart City, Nanchang 330047, China
  • Online:2021-11-01 Published:2021-11-04

摘要:

注意力被广泛地运用在卷积神经网络中,并有效地提升了卷积神经网络的性能。同时,注意力是非常轻量的,且几乎不需要改变卷积神经网络原来的架构。提出了基于HardSoftmax的并行选择核注意力。针对Softmax包含指数运算,对于较大的正输入很容易发生计算溢出的问题,提出了计算更安全的HardSoftmax来替换Softmax。不同于选择核注意力将全局特征的提取和转换放在特征融合之后,并行选择核注意力将全局特征的提取和转换单独放在一个分支,与具有不同核大小的多个分支构成并行结构。同时,并行选择核注意力的全局特征转换使用分组卷积,进一步减少参数量和计算量。并行选择核注意力通过HardSoftmax注意来关注不同核大小的多个分支。一系列的图像分类实验表明,只是简单地用HardSoftmax替换Softmax,也能保持或提升原注意力的性能。HardSoftmax的运行速度在实验中也比Softmax更快速。并行选择核注意力能够以更少的参数量和计算量追平或超越选择核注意力。

关键词: 卷积神经网络, HardSoftmax, 并行选择核注意力

Abstract:

Attention has been widely used in Convolutional Neural Networks(CNNs), and the performance of CNNs  is effectively improved. At the same time, attention is very lightweight, and almost does not need to change the original architecture of CNNs. This paper proposes Parallel Selective Kernel(PSK) attention based on HardSoftmax. Firstly, to solve the problem that Softmax contains exponential operation, which is easy to occur computational overflow for large positive inputs, this paper proposes computationally safer HardSoftmax to replace Softmax. Then, different from Selective Kernel(SK) attention which puts the extraction and transformation of global features after feature fusion, PSK attention puts it in one branch alone, thus being in parallel connection with multiple branches with different kernel sizes. Meanwhile, the transformation of global features uses group convolution to further reduce the number of parameters and Multiply Adds(MAdds). Finally, multiple branches with different kernel sizes are fused using HardSoftmax attention that is guided by the information in these branches. A wide range of image classification experiments show that just simply replacing Softmax with HardSoftmax can maintain or improve the performance of original attention. HardSoftmax also runs faster than Softmax in the experiments of this paper. PSK attention can match or outperform SK attention with less parameters and MAdds.

Key words: Convolutional Neural Networks(CNNs), HardSoftmax, Parallel Selective Kernel(PSK) attention