Neural Network Compression Algorithm Based on Adversarial Learning and Knowledge Distillation

doi:10.3778/j.issn.1002-8331.2105-0295

Abstract

Abstract:

In order to solve the problems that face recognition models based on deep learning are difficult to deploy in embedded devices and their real-time performance is poor, the existing model compression and acceleration algorithms are deeply studied, and a neural network compression algorithm based on knowledge distillation and adversarial learning is proposed. The algorithm framework consists of three parts：a large-scale teacher network for pre-training, a lightweight student network and a discriminator for adversarial learning. This paper improves the traditional knowledge distillation loss by adding an indicator function, so that the student network only learns the classification probability that the teacher network correctly identifies. Since the feature map of the middle layer has rich high-dimensional features, the discriminator in adversarial learning strategy is introduced to identify the difference between the student network and the teacher network in the feature map level. Furthermore, in order to improve the generalization ability of the student network and enable it to be applied to different machine vision tasks, the teacher network and the student network learn from each other in the latter part of the training and update alternately, so that the student network can explore its own optimal solution. The results are verified on CASIA WEBFACE and CelebA data sets respectively. The experimental results show that the recognition accuracy of the small size student network obtained by knowledge distillation is only about 1.5% lower than that of the teacher network with full supervision training. At the same time, the proposed method is compared with the feature map-oriented knowledge distillation algorithm and the model compression algorithm based on adversarial learning training, and the proposed method has a better performance.

Key words: knowledge distillation, adversarial learning, mutual learning, model compression, face recognition

摘要：

针对基于深度学习的人脸识别模型难以在嵌入式设备进行部署和实时性能差的问题，深入研究了现有的模型压缩和加速算法，提出了一种基于知识蒸馏和对抗学习的神经网络压缩算法。算法框架由三部分组成，预训练的大规模教师网络、轻量级的学生网络和辅助对抗学习的判别器。改进传统的知识蒸馏损失，增加指示函数，使学生网络只学习教师网络正确识别的分类概率；鉴于中间层特征图具有丰富的高维特征，引入对抗学习策略中的判别器，鉴别学生网络与教师网络在特征图层面的差异；为了进一步提高学生网络的泛化能力，使其能够应用于不同的机器视觉任务，在训练的后半部分教师网络和学生网络相互学习，交替更新，使学生网络能够探索自己的最优解空间。分别在CASIA WEBFACE和CelebA两个数据集上进行验证，实验结果表明知识蒸馏得到的小尺寸学生网络相较全监督训练的教师网络，识别准确率仅下降了1.5%左右。同时将本研究所提方法与面向特征图知识蒸馏算法和基于对抗学习训练的模型压缩算法进行对比，所提方法具有较高的人脸识别准确率。

关键词: 知识蒸馏, 对抗学习, 互学习, 模型压缩, 人脸识别

LIU Jinjin, LI Qingbao, LI Xiaonan. Neural Network Compression Algorithm Based on Adversarial Learning and Knowledge Distillation[J]. Computer Engineering and Applications, 2021, 57(21): 180-187.

刘金金，李清宝，李晓楠. 基于对抗学习和知识蒸馏的神经网络压缩算法[J]. 计算机工程与应用, 2021, 57(21): 180-187.

References

[1] HE K M，ZHANG X Y，REN S Q，et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition，Las Vegas，June 27-30，2016.Los Alamitos：IEEE Computer Society，2016：770-778.
[2] 鲁统伟，徐子昕，闵锋.基于生成对抗网络的知识蒸馏数据增强[J/OL].计算机工程：1-13[2021-05-18].https：//doi.org/10.19678/j.issn.1000-3428.0060395.
LU T W，XU Z X，MIN F.Knowledge distillation data augmentation based on generation adversarial network[J/OL].Computer Engineering，1-13[2021-05-18].https：//doi.org/10.19678/j.issn.1000-3428.0060395.
[3] GüLER R A，NEVEROVA N，KOKKINOS I.DensePose：dense human pose estimation in the wild[C]//IEEE Conference on Computer Vision and Pattern Recognition，Salt Lake City，June 18-22，2018.Los Alamitos：IEEE Computer Society，2018：7297-7306.
[4] LI W，ZHU X T，GONG S G.Person re-identification by deep joint learning of multi-loss classification[C]//International Joint Conference on Artificial Intelligence，Melbourne，August 19-25，2017.San Francisco：Morgan Kaufmann，2017：2194-2200.
[5] SUN Y，CHEN Y，WANG X，et al.Deep learning face representation by joint identification-verification[C]//Conference on Neural Information Processing Systems，Montreal，December 8-13，2014.Cambridge：MIT Press，2014：1988-1996.
[6] MACLAURIN D，DUVENAUD D，ADAMS R P.Early stopping is nonparametric variational inference[J].arXiv：1504.01344，2015.
[7] MAHSERECI M，BALLES L，LASSNER C，et al.Early stopping without a validation set[J].arXiv：1703.09580，2017.
[8] IOFFE S，SZEGEDY C.Batch nor-malization：accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning，Lille，6-11 July，2015.New York：ACM，2015：448-456.
[9] HINTON G，VINYALS O，DEAN J.Distilling the knowledge in a neural network[J].arXiv：1503.02531，2015.
[10] GOODFELLOW I J，POUGET-ABADIE J，MIRZA M，et al.Generative adversarial networks[C]//Conference on Neural Information Processing Systems，Montreal，December 8-13，2014.Cambridge：MIT Press，2014：2672-2680.
[11] 毕鹏程，罗健欣，陈卫卫.轻量化卷积神经网络技术研究[J].计算机工程与应用，2019，55（16）：25-35.
BI P C，LUO J X，CHEN W W.Research on lightweight convolutional neural network technology[J].Computer Engineering and Applications，2019，55（16）：25-35.
[12] HASSIBI B，STORK D G.Second order derivatives for network pruning：optimal brain surgeon[C]//Conference on Neural Information Processing Systems，Denver，November 30-December 3，1992.Cambridge：MIT Press，1992：164-171.
[13] GONG Y C，LIU L，YANG M，et al.Compressing deep convolutional networks using vector quantization[J].arXiv：1412.6115，2014.
[14] RASTEGARI M，ORDONEZ V，REDMON J，et al.XNOR-net：imagenet classi-fication using binary convolutional neural networks[C]//European Conference on Computer Vision，Amsterdam，October 11-14，2016.Berlin：Springer，2016：525-542.
[15] COURBARIAUX M，BENGIO Y，DAVID J P.Binaryconnect：training deep neural networks with binary weights during propaga-tions[C]//Conference on Neural Information Processing Systems，Montreal，December 7-12，2015.Cambridge：MIT Press，2015：3123-3131.
[16] HAN S，MAO H，DALLY W J.Deep compression：Compressing deep neural networks with pruning，trained quantization and human coding[C]//International Conference on Learning Representations，San Juan，May 2-4，2016.
[17] CHEN W L，WILSON J T，TYREE S，et al.Compressing neural networks with the hashing trick[C]//International Conference on Machine Learning，Lille，6-11 July 2015.New York：ACM，2015：2285-2294.
[18] LI H，KADAV A，DURDANOVIC I，et al.Pruning filters for efficient ConvNets[C]//International Conference on Learning Representations，Toulon，April 24-26，2017.
[19] HOWARD A G，ZHU M L，CHEN B，et al.MobileNets：efficient convolutional neural networks for mobile vision applications[J].arXiv：1704.04861，2017.
[20] ZHANG X Y，ZHOU X Y，LIN M X，et al.ShuffleNet：an extremely efficient convolutional neural network for mobile devices[C]//IEEE Conference on Computer Vision and Pattern Recognition，Salt Lake City，June 18-22，2018.Los Alamitos：IEEE Computer Society，2018：6848-6856.
[21] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Conference on Neural Information Processing Systems，Lake Tahoe，December 3-6，2012.Cambridge：MIT Press，2012：1097-1105.
[22] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//International Conference on Learning Representations，San Diego，May 7-9，2015.
[23] IANDOLA F N，MOSKEWICZ M W，ASHRAF K，et al.SqueezeNet：Alexnet-level accuracy with 50x fewer parameters and<0.5 mb model size[J].arXiv：1602.07360，2016.
[24] SZEGEDY C，VANHOUCKE V，IOFFE S，et al.Rethinking the inception architecture for computer vision[C]//IEEE Conference on Computer Vision and Pattern Recognition，Las Vegas，June 27-30，2016.Los Alamitos：IEEE Computer Society，2016：2818-2826.
[25] CHOLLET F.Xception：deep learning with depthwise separable convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition，Honolulu，July 21-26，2017.Los Alamitos：IEEE Computer Society，2017：1800-1807.
[26] HUANG G，LIU S C，VAN DER MAATEN L，et al.Condensenet：an efficient densenet using learned group convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition，Salt Lake City，June 18-22，2018.Los Alamitos：IEEE Computer Society，2018：2752-2761.
[27] LEBEDEV V，GANIN Y，RAKHUBA M，et al.Speeding-up convolutional neural networks using fine-tuned CP-decomposition[C]//International Conference on Learning Representations，San Diego，May 7-9，2015.
[28] CHEN Y，FAN H，XU B，et al.Drop an octave：reducing spatial redundancy in convolutional neural networks with octave convolution[C]//Proceedings of the IEEE Interna-tional Conference on Computer Vision，Seoul，October 27-November 2，2019.Piscataway：IEEE，2019：3435-3444.
[29] TAN M，CHEN B，PANG R，et al.Mnasnet：platform-aware neural architecture search for mobile[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Los Angeles，June 16-20，2019.Piscataway：IEEE，2019：2820-2828.
[30] ASIF U，TANG J B，HARRER S.En-semble knowledge distillation for learning improved and efficient networks[C]//European Conference on Artificial Intelligence，Santiago de Compostela，29 August-8 September 2020.Amsterdam：IOS Press，2020：953-960.
[31] CHUNG I，PARK S，KIM J，et al.Feature-map-level online adversarial knowledge distillation[C]//International Conference on Learning Representations，Addis Ababa，April 26-30，2020：2006-2015.
[32] KARLEKAR J，FENG J S，WONG Z S，et al.Deep face recognition model compression via knowledge transfer and distillation[J].arXiv：1906.00619，2019.
[33] BELAGIANNIS V，FARSHAD A，GALASSO F.Adversarial network compression[C]//European Conference on Computer Vision，Munich，September 8-14，2018.Berlin：Springer，2018：431-449.
[34] ZEILER M D，FERGUS R.Visualizing and understanding convolutional networks[C]//European Conference on Computer Vision，Zurich，September 6-12，2014.Berlin：Springer，2014：818-833.
[35] CHO J H，HARIHARAN B.On the efficacy of knowledge distillation[C]//IEEE International Conference on Computer Vision，Seoul，October 27-November 2，2019.Piscataway：IEEE，2019：4793-4801.
[36] ZHANG Y，XIANG T，HOSPEDALES T M，et al.Deep mutual learning[C]//IEEE Conference on Computer Vision and Pattern Recognition，Salt Lake City，June 18-22，2018.Los Alamitos：IEEE Computer Society，2018：4320-4328.
[37] YI D，LEI Z，LIAO S C，et al.Learning face representation from scratch[J].arXiv：1411.7923，2014.
[38] LIU Z W，LUO P，WANG X G，et al.Deep learning face attributes in the wild[C]//IEEE International Conference on Computer Vision，Santiago，December 7-13，2015.Alamitos：IEEE Computer Society，2015：3730-3738.
[39] XU Z，HSU Y C，HUANG J W.Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks[C]//International Conference on Learning Representations，Vancouver，April 30-May 3，2018.
[40] ROMERO A，BALLAS N，KAHOU S E，et al.Fitnets：hints for thin deep nets[C]//International Conference on Learning Representations，San Diego，May 7-9，2015.
[41] ZHOU Z D，ZHUGE C R，GUAN X W，et al.Channel distillation：channel-wise attention for knowledge distillation[J].arXiv：2006.01683，2020.
[42] LIU P Y，LIU W，MA H D，et al.KTAN：knowledge transfer adversarial network[C]//International Joint Conference on Neural Networks，Glasgow，July 19-24，2020.Piscataway：IEEE，2020：1-7.