计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (6): 163-171.DOI: 10.3778/j.issn.1002-8331.2210-0420

• 模式识别与人工智能 • 上一篇    下一篇

滤波器弹性的深度神经网络通道剪枝压缩方法

李瑞权,朱路,刘媛媛   

  1. 华东交通大学 信息工程学院,南昌 330013
  • 出版日期:2024-03-15 发布日期:2024-03-15

Deep Neural Network Channel Pruning Compression Method for Filter Elasticity

LI Ruiquan, ZHU Lu, LIU Yuanyuan   

  1. School of Information Engineering, East China Jiaotong University, Nanchang 330013, China
  • Online:2024-03-15 Published:2024-03-15

摘要: 深度神经网络(deep neural network,DNN)在各个领域获得了巨大成功,由于其需要高额的计算和储存成本,难以直接将它们部署到资源受限的移动设备端。针对这个问题,对网络中的全局滤波器重要性评估进行了研究,提出滤波器弹性的通道剪枝压缩方法以轻量化神经网络的规模。该方法先设置层间局部动态阈值改进L1正则化(L1 lasso)稀疏训练中剪枝过度的不足;然后将其输出乘以通道缩放因子替换普通的卷积层模块,利用滤波器的弹性大小定义全局滤波器的重要性,其数值由泰勒公式估计得出并排序,同时设计新的滤波器迭代剪枝框架,以平衡剪枝性能和剪枝速度的矛盾;最后利用改进的L1正则化训练和全局滤波器重要性程度进行复合通道剪枝。在CIFAR-10上使用所提方法对VGG-16进行实验,减少了80.2%的浮点运算次数(FLOPs)和97.0%的参数量,而没有明显的准确性损失,表明了方法的有效性,能大规模地压缩神经网络,可部署于资源受限的终端设备。

关键词: 模型压缩, 滤波器重要性, 通道剪枝, 缩放因子, 弹性

Abstract: Deep neural network (DNN) has achieved great success in various fields. Due to its high computing and storage costs, it is difficult to directly deploy them to resource constrained mobile devices. To solve this problem, the importance evaluation of the global filter in the network is studied, and a channel pruning compression method with filter elasticity is proposed to reduce the size of the neural network. Firstly, the method sets the local dynamic threshold between layers to improve the shortage of over pruning in L1 regularization (L1 lasso) sparse training. Then, its output is multiplied by the channel scaling factor to replace the ordinary convolution layer module. The importance of the global filter is defined by the elastic size of the filter. Its values are estimated and ranked by Taylor formula. At the same time, a new iterative pruning framework of the filter is designed to balance the contradiction between the pruning performance and the pruning speed. Finally, the improved L1 regularization training and the importance of the global filter are used to prune the composite channels. VGG-16 is tested on CIFAR-10 using the proposed method, which reduces 80.2% of floating-point operations (FLOPs) and 97.0% of parameter quantities, without significant loss of accuracy, indicating the effectiveness of the method, which can compress neural networks in a large scale, and can be deployed to resource constrained terminal devices.

Key words: model compression, filter importance, channel pruning, scaling factor, elastic