Sparse Binary Programming Method for Pruning of Randomly Initialized Neural Networks

doi:10.3778/j.issn.1002-8331.2112-0091

Abstract

Abstract: Classical pruning algorithms for deep neural networks typically need to pre-train the model before pruning and fine tune the sparse network after pruning. Inspired by the recent remarkable success of random initialization based pruning methods such as edge-popup, this paper proposes a sparse binary programming method for random network pruning. The algorithm models the pruning training process as a sparse binary constrained optimization problem. The core idea is to use sparse binary programming to learn a binary mask, using which an untrained but well-performing sparse network can be pruned from a randomly initialized neural network. Compared with previous pruning algorithms based on randomly initialized networks, the sparse network found by this algorithm has better classification generalization performance at multiple sparsity degrees. Compared with the edge-popup algorithm, the model improves the accuracy by 7.98 percentage points at 70% sparsity in the ImageNet dataset classification task. In the CIFAR-10 dataset classification task, the model improves 2.48 percentage points accuracy at 50% sparsity.

Key words: neural network pruning, random initialization, binary mask, binary programming, sparse optimization

摘要： 传统深度神经网络剪枝方法往往以预训练模型为初始网络并需要在剪枝后进行微调。受到近年来edge-popup等基于随机初始化网络的剪枝算法优异性能的启发，提出了一种基于稀疏二值规划的随机初始化网络剪枝算法。该算法将剪枝训练过程建模为一个稀疏二值约束优化问题。其核心思想是利用稀疏二值规划来学习一个二值掩膜，利用该掩膜可以从随机初始化的神经网络上裁剪出一个未经训练却性能良好的稀疏网络。与之前基于随机初始化网络的剪枝算法相比，该算法找到的稀疏网络在多个稀疏度下都具有更好的分类泛化性能。与edge-popup算法相比，在ImageNet数据集分类任务中，模型在稀疏度为70%时精度提升7.98个百分点。在CIFAR-10数据集分类任务中，模型在稀疏度为50%时精度提升2.48个百分点。

关键词: 神经网络剪枝, 随机初始化, 二值掩膜, 二值规划, 稀疏优化

LU Lin, JI Fanfan, YUAN Xiaotong. Sparse Binary Programming Method for Pruning of Randomly Initialized Neural Networks[J]. Computer Engineering and Applications, 2023, 59(8): 138-147.

陆林, 季繁繁, 袁晓彤. 随机初始化神经网络剪枝的稀疏二值规划方法[J]. 计算机工程与应用, 2023, 59(8): 138-147.

References

[1] 张良，张增，舒伟华，等.基于YOLOv3的卷积层结构化剪枝[J].计算机工程与应用，2021，57（6）：131-137.
ZHANG L，ZHANG Z，SHU W H，et al.Convolutional layered pruning based on YOLOv3[J].Computer Engineering and Applications，2021，57（6）：131-137.
[2] 黄文斌，陈仁文，袁婷婷.改进YOLOv3-SPP的无人机目标检测模型压缩方案[J].计算机工程与应用，2021，57（21）：165-173.
HUANG W B，CHEN R W，YUAN T T.Compression of UAV object detection model based on improved YOLOv3-SPP[J].Computer Engineering and Applications，2021，57（21）：165-173.
[3] LECUN Y，DENKER J S，SOLLA S A.Optimal brain damage[C]//Advances in Neural Information Processing Systems，1990：598-605.
[4] HAN S，POOL J，TRAN J，et al.Learning both weights and connections for efficient neural networks[J].arXiv：1506.02626，2015.
[5] LI H，KADAV A，DURDANOVIC I，et al.Pruning filters for efficient convnets[J].arXiv：1608.08710，2016.
[6] HE Y，KANG G，DONG X，et al.Soft filter pruning for accelerating deep convolutional neural networks[J].arXiv：1808.06866，2018.
[7] LUO J H，WU J，LIN W.Thinet：a filter level pruning method for deep neural network compression[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：5058-5066.
[8] FRANKLE J，CARBIN M.The lottery ticket hypothesis：finding sparse，trainable neural networks[C]//International Conference on Learning Representations，2018.
[9] LEE N，AJANTHAN T，TORR P H S.Snip：single-shot network pruning based on connection sensitivity[J].arXiv：1810.02340，2018.
[10] WANG C，ZHANG G，GROSSE R.Picking winning tickets before training by preserving gradient flow[J].arXiv：2002.07376，2020.
[11] WANG Y，ZHANG X，XIE L，et al.Pruning from scratch[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2020：12273-12280.
[12] ZHOU H，LAN J，LIU R，et al.Deconstructing lottery tickets：zeros，signs，and the supermask[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems，2019：3597-3607.
[13] RAMANUJAN V，WORTSMAN M，KEMBHAVI A，et al.What’s hidden in a randomly weighted neural network?[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：11893-11902.
[14] HOEFLER T，ALISTARH D，BEN-NUN T，et al.Sparsity in deep learning：pruning and growth for efficient inference and training in neural networks[J].arXiv：2102.00554，2021.
[15] ZHOU X，ZHANG W，XU H，et al.Effective sparsification of neural networks with global sparsity constraint[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：3599-3608.
[16] COURBARIAUX M，HUBARA I，SOUDRY D，et al.Binarized neural networks：training deep neural networks with weights and activations constrained to +1 or -1[J].arXiv：1602.02830，2016.
[17] KRIZHEVSKY A.Learning multiple layers of features from tiny images[D].University of Tront，2009.
[18] DENG J，DONG W，SOCHER R，et al.Imagenet：a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition，2009：248-255.
[19] FRANKLE J，DZIUGAITE G K，ROY D M，et al.Stabilizing the lottery ticket hypothesis[J].arXiv：1903.01611，2019.
[20] MALACH E，YEHUDAI G，SHALEV-SCHWARTZ S，et al.Proving the lottery ticket hypothesis：pruning is all you need[C]//International Conference on Machine Learning，2020：6682-6691.
[21] ORSEAU L，HUTTER M，RIVASPLATA O.Logarithmic pruning is all you need[C]//Advances in Neural Information Processing Systems，2020.
[22] PENSIA A，RAJPUT S，NAGLE A，et al.Optimal lottery tickets via subsetsum：logarithmic over-parameterization is sufficient[J].arXiv：2006.07990，2020.
[23] 张彪，杨朋波，桑基韬，等.基于特征归因重要度评价的卷积网络剪枝[J].中国科学：信息科学，2021，51（1）：13-26.
ZHANG B，YANG P B，SANG J T，et al.Convolution network pruning based on the evaluation of the importance of characteristic attributions[J].Scientia Sinica（Informationis），2021，51（1）：13-26.
[24] TANAKA H，KUNIN D，YAMINS D L K，et al.Pruning neural networks without any data by iteratively conserving synaptic flow[J].arXiv：2006.05467，2020.
[25] BENGIO Y，LéONARD N，COURVILLE A.Estimating or propagating gradients through stochastic neurons for conditional computation[J].arXiv：1308.3432，2013.
[26] HE K，ZHANG X，REN S，et al.Delving deep into rectifiers：surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1026-1034.
[27] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[28] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409.
1556，2014.
[29] LOSHCHILOV I，HUTTER F.Sgdr：Stochastic gradient descent with warm restarts[J].arXiv：1608.03983，2016.
[30] ZAGORUYKO S，KOMODAKIS N.Wide residual networks[J].arXiv：1605.07146，2016.