计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (21): 180-187.DOI: 10.3778/j.issn.1002-8331.2105-0295

• 模式识别与人工智能 • 上一篇    下一篇

基于对抗学习和知识蒸馏的神经网络压缩算法

刘金金,李清宝,李晓楠   

  1. 1.战略支援部队信息工程大学,郑州 450003
    2.数学工程与先进计算国家重点实验室,郑州 450003
    3.中原工学院 计算机学院,郑州 450007
  • 出版日期:2021-11-01 发布日期:2021-11-04

Neural Network Compression Algorithm Based on Adversarial Learning and Knowledge Distillation

LIU Jinjin, LI Qingbao, LI Xiaonan   

  1. 1.Information Engineering University, Zhengzhou 450003, China
    2.State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450003, China
    3.School of Computer Science, Zhongyuan University of Technology, Zhengzhou 450007, China
  • Online:2021-11-01 Published:2021-11-04

摘要:

针对基于深度学习的人脸识别模型难以在嵌入式设备进行部署和实时性能差的问题,深入研究了现有的模型压缩和加速算法,提出了一种基于知识蒸馏和对抗学习的神经网络压缩算法。算法框架由三部分组成,预训练的大规模教师网络、轻量级的学生网络和辅助对抗学习的判别器。改进传统的知识蒸馏损失,增加指示函数,使学生网络只学习教师网络正确识别的分类概率;鉴于中间层特征图具有丰富的高维特征,引入对抗学习策略中的判别器,鉴别学生网络与教师网络在特征图层面的差异;为了进一步提高学生网络的泛化能力,使其能够应用于不同的机器视觉任务,在训练的后半部分教师网络和学生网络相互学习,交替更新,使学生网络能够探索自己的最优解空间。分别在CASIA WEBFACE和CelebA两个数据集上进行验证,实验结果表明知识蒸馏得到的小尺寸学生网络相较全监督训练的教师网络,识别准确率仅下降了1.5%左右。同时将本研究所提方法与面向特征图知识蒸馏算法和基于对抗学习训练的模型压缩算法进行对比,所提方法具有较高的人脸识别准确率。

关键词: 知识蒸馏, 对抗学习, 互学习, 模型压缩, 人脸识别

Abstract:

In order to solve the problems that face recognition models based on deep learning are difficult to deploy in embedded devices and their real-time performance is poor, the existing model compression and acceleration algorithms are deeply studied, and a neural network compression algorithm based on knowledge distillation and adversarial learning is proposed. The algorithm framework consists of three parts:a large-scale teacher network for pre-training, a lightweight student network and a discriminator for adversarial learning. This paper improves the traditional knowledge distillation loss by adding an indicator function, so that the student network only learns the classification probability that the teacher network correctly identifies. Since the feature map of the middle layer has rich high-dimensional features, the discriminator in adversarial learning strategy is introduced to identify the difference between the student network and the teacher network in the feature map level. Furthermore, in order to improve the generalization ability of the student network and enable it to be applied to different machine vision tasks, the teacher network and the student network learn from each other in the latter part of the training and update alternately, so that the student network can explore its own optimal solution. The results are verified on CASIA WEBFACE and CelebA data sets respectively. The experimental results show that the recognition accuracy of the small size student network obtained by knowledge distillation is only about 1.5% lower than that of the teacher network with full supervision training. At the same time, the proposed method is compared with the feature map-oriented knowledge distillation algorithm and the model compression algorithm based on adversarial learning training, and the proposed method has a better performance.

Key words: knowledge distillation, adversarial learning, mutual learning, model compression, face recognition