计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (22): 108-115.DOI: 10.3778/j.issn.1002-8331.2105-0134

• 模式识别与人工智能 • 上一篇    下一篇

融合剪枝与量化的目标检测网络压缩方法

杨国威,许志旺,房臣,王以忠   

  1. 天津科技大学 电子信息与自动化学院,天津 300222
  • 出版日期:2022-11-15 发布日期:2022-11-15

Object Detection Network Compression Method Based on Pruning and Quantization

YANG Guowei, XU Zhiwang, FANG Chen, WANG Yizhong   

  1. College of Electronic Information and Automation, Tianjin University of Science & Technology, Tianjin 300222, China
  • Online:2022-11-15 Published:2022-11-15

摘要: 针对目标检测网络参数量冗余、模型复杂、推理速度缓慢以及难以部署在资源受限的嵌入式设备等问题,提出一种融合剪枝与量化的目标检测网络压缩方法。首先对目标检测网络模型进行稀疏化训练得到缩放因子,并根据缩放因子的分布计算卷积层中通道重要性的占比,根据缩放因子计算动态阈值将对网络模型贡献小的卷积层剪除。然后通过均匀映射的方式将32位浮点型数据量化成8位整型数据,减少网络计算量的同时压缩网络模型的大小。最后采用YOLO系列目标检测网络对行人与车辆数据集、Hands数据集和VOC2012数据集进行压缩方法验证。实验表明,目标检测网络经过动态阈值剪枝和均匀映射量化后在精度损失4%的前提下,将网络模型从234?MB压缩至10?MB以内,检测速度提升5倍,有效解决了部署应用难的问题。

关键词: 目标检测网络, 模型压缩, 动态阈值剪枝, 缩放因子, 均匀映射量化

Abstract: Aiming at the problems of redundant parameters, complex model, slow reasoning speed and difficulty of deploying in resource constrained embedded devices in target detection network, a target detection network compression method combining pruning and quantization is proposed. Firstly, the target detection network model is sparsely trained to obtain the scaling factor, and the proportion of channel importance in the convolution layer is calculated according to the distribution of the scaling factor, and the dynamic threshold is calculated according to the scaling factor to cut off the convolution layer which has little contribution to the network model. Then the 32 bit floating-point data are transformed into 8 bit integer data by uniform mapping, which reduces the network computation and reduces the size of the network model. The compression methods of Pedestrian and Vehicle dataset, Hands dataset and VOC2012 dataset are verified by using YOLO series target detection network. The experimental results show that after dynamic threshold pruning and uniform mapping quantization, the network model is compressed from 234 MB to 10 MB under the premise of accuracy loss of 4%, and the detection speed is increased by 5 times, which effectively solves the problem of difficult deployment and application.

Key words: object detection network, model compression, dynamic threshold pruning, scaling factor, uniform mapping quantization