多通路轻量化卷积神经网络的研究

doi:10.3778/j.issn.1002-8331.2110-0083

摘要/Abstract

摘要： 大量研究表明，卷积神经网络宽度展宽可以提取更加多元的特征，但对模型通道宽度进行展宽时，引起训练参数量呈二次增长，进而导致训练成本高与网络模型庞大的问题。针对上述问题，提出多通路模块，通过优化模块内部通路的运算结构，使得模型可以经济高效地提取多元特征。具体来说，与传统网络宽度展宽方式相比，多通路模块避免在通道维度上展宽，将展宽方式转移到通路维度，保证输出特征多样性的同时有效降低了模型参数量。由于网络深度加深更容易提取抽象的语义信息，多通路模块通过对各计算通路设定不同的卷积深度，使得模块具有多尺度特征提取能力，即使得输出特征中既包含细节位置信息，也包括逐渐抽象的语义信息。在研究过程中，利用注意力机制改善模块内通路关系时，发现注意力机制存在一定条件性。实验结果表明，由多通路模块组成的5.3?MB轻量化卷积神经网络，在CiFar-10上分类准确度比43.4?MB的Res-Net-18高出1.32%。对模型优化后（33?MB），分类准确度达到了95.15%，较SE-Net18（45.1?MB）精度提升0.65%。

关键词: 多通路, 轻量化, 网络宽度, 特征分布, 注意力机制

Abstract: Numerous studies have shown that network width widening can extract more multivariate features, but the number of parameters grows quadratically when the model channel width is widened, leading to high training costs and large network models. To address these problems, a multi-path module is proposed. By optimizing the calculation structure of the internal path of the module, the model can extract multiple features economically and efficiently. Specifically, compared with the traditional network width expansion method, the multi-pathway module avoids the expansion in the channel dimension and shifts the expansion method to the pathway dimension, so that the number of model parameters can be effectively reduced while ensuring the diversity of output features. Since it is easier to extract abstract semantical information by depth deepening, the multi-path module has multi-scale feature extraction capability by setting different convolution depths for each computational pathway （the feature map contains both detailed location information and gradually abstract semantic information）. During the research process, as this paper tries to improve the pathway relationship within the module by invoking the attention mechanism, and observes that the attention mechanism has certain conditionality. Experimental results show that the 5.3 MB lightweight convolutional neural network composed of this multi-path module has a classification accuracy of 1.32% higher than that of Res-Net-18 with 43.4 MB on CiFar-10, and after optimization of the model （33 MB）, the classification accuracy reaches 95.15% higher than that of SE-Net18 （45.1 MB） by 0.65%.

Key words: multi-path, light-weighting, width, feature distribution, attention mechanism

赵立欣, 白宇, 安胜彪. 多通路轻量化卷积神经网络的研究[J]. 计算机工程与应用, 2023, 59(6): 134-145.

ZHAO Lixin, BAI Yu, AN Shengbiao. Research on Multi-Path Lightweight Convolutional Neural Network[J]. Computer Engineering and Applications, 2023, 59(6): 134-145.

参考文献

[1] LIN T，DOLLáR P，GIRSHICK R B，et al.Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2017：936-944.
[2] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2016：770-778.
[3] SZEGEDY C，LIU W，JIA Y，et al.Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2015：1-9.
[4] GAL Y，GHAHRAMANI Z.Bayesian convolutional neural networks with bernoulli approximate variational inference[J].arXiv：1506.02158，2015.
[5] ABDAL R，ZHU P，MITRA N J，et al.StyleFlow：attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows[J].arXiv：2008.02401，2020.
[6] GOAN E，FOOKES C.Bayesian neural networks：an introduction and survey[J].arXiv：2006.12024，2020.
[7] PAWLOWSKI N，RAJCHL M，GLOCKER B.Implicit weight uncertainty in neural networks[J].arXiv：1711.01297，2017.
[8] HOWARD A G，ZHU M，CHEN B，et al.MobileNets：efficient convolutional neural networks for mobile vision applications[J].arXiv：1704.04861，2017.
[9] JACOT A，GABRIEL F，HONGLER C.Neural tangent kernel：convergence and generalization in neural networks （invited paper）[C]//Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing，2018.
[10] ARORA S，DU S S，HU W，et al.On exact computation with an infinitely wide neural net[J].arXiv：1904.11955，2019.
[11] YANG G，HU E J.Feature learning in infinite-width neural networks[J].arXiv：2011.14522，2020.
[12] GOLUBEVA A，NEYSHABUR B，GUR-ARI G.Are wider nets better given the same number of parameters?[J].arXiv：2010.14495，2020.
[13] 刘洋，战荫伟.基于深度学习的小目标检测算法综述[J].计算机工程与应用，2021，57（2）：37-48.
LIU Y，ZHAN Y W.Survey of small object detection algorithms based on deep learning[J].Computer Engineering and Applications，2021，57（2）：37-48.
[14] 肖振久，杨晓迪，魏宪，等.改进的轻量型网络在图像识别上的应用[J].计算机科学与探索，2021，15（4）：743-753.
XIAO Z J，YANG X D，WEI X，et al.Improved lightweight network in image recognition[J].Journal of Frontiers of Computer Science and Technology，2021，15（4）：743-753.
[15] SHWARTZ-ZIV R，TISHBY N.Opening the black box of deep neural networks via information[J].arXiv：1703.
00810，2017.
[16] TISHBY N，ZASLAVSKY N.Deep learning and the information bottleneck principle[C]//2015 IEEE Information Theory Workshop（ITW），2015：1-5.
[17] COVER T M，THOMAS J A.Elements of information theory[M].[S.l.]：Wiley-Interscience，1991.
[18] SHANG W，SOHN K，ALMEIDA D，et al.Understanding and improving convolutional neural networks via concatenated rectified linear units[C]//Proceedings of the 33rd International Conference on Machine Learning，2016.
[19] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM，2012，60：84-90.
[20] IOFFE S，SZEGEDY C.Batch normalization：accelerating deep network training by reducing internal covariate shift[J].arXiv：1502.03167，2015.
[21] SZEGEDY C，VANHOUCKE V，IOFFE S，et al.Rethinking the inception architecture for computer vision[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2016：2818-2826.
[22] SZEGEDY C，IOFFE S，VANHOUCKE V，et al.Inception-v4，inception-ResNet and the impact of residual connections on learning[J].arXiv：1602.07261，2016.
[23] GAO S，CHENG M，ZHAO K，et al.Res2Net：a new multi-scale backbone architecture[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2021，43：652-662.
[24] HU J，SHEN L，ALBANIE S，et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020：42：2011-2023.
[25] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409.
1556，2014.
[26] REDMON J，FARHADI A.YOLOv3：an incremental improvement[J].arXiv：1804.02767，2018.
[27] 权宇，李志欣，张灿龙，等.融合深度扩张网络和轻量化网络的目标检测模型[J].电子学报，2020，48（2）：390-397.
QUAN Y，LI Z X，ZHANG C L，et al.Fusing deep dilated convolutions network and light-weight network for object dection[J].Acta Electronica Sinica，2020，48（2）：390-397.
[28] VASWANI A，SHAZEER N M，PARMAR N，et al.Attention is all you need[J].arXiv：1706.03762，2017.
[29] HAN K，WANG Y，CHEN H，et al.A survey on visual transformer[J].arXiv：2012.12556，2020.
[30] HUANG Z，WANG X，HUANG L，et al.CCNet：criss-cross attention for semantic segmentation[C]//2019 IEEE/CVF International Conference on Computer Vision（ICCV），2019：603-612.
[31] WANG Q，WU B，ZHU P，et al.ECA-net：efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020：11531-11539.
[32] WANG X，GIRSHICK R B，GUPTA A，et al.Non-local neural networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：7794-7803.
[33] HOU Q，ZHOU D，FENG J.Coordinate attention for efficient mobile network design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2021：13708-13717.
[34] WOO S，PARK J，LEE J，et al.CBAM：convolutional block attention module[J].arXiv：1807.06521，2018.