Research of YOLOv3 Based on Knowledge Distillation

doi:10.3778/j.issn.1002-8331.2101-0089

Abstract

Abstract: As a model compression method, knowledge distillation transfers the knowledge from a large network（teacher network） to a small network（student network）, making the accuracy of the small network closer to that of the large network. Knowledge distillation achieves good effect in image classification, but there is less research on object detection, and it needs to be improved. The current distillation methods of object detection are mainly based on the distillation of the feature extraction layer. However, there are two problems. Firstly, the importance of knowledge transmitted by the teacher network is not measured. Secondly, only the output of the feature extraction layer is distilled. Teacher network cannot fully transfer knowledge to student network. For the first problem, information map is introduced as the supervision signal of distillation, which strengthens the learning of key knowledge of the teacher network by the student network. For the second problem, the output of the feature extraction layer and the feature fusion layer are distilled at the same time. Student model can learn more about the knowledge delivered by teacher network. Experimental results show that mAP index value can improve 9.3 percentage points without changing network structure of student network based on YOLOv3.

Key words: knowledge distillation, model compression, object detection, YOLOv3

摘要： 知识蒸馏作为一种模型压缩方法，将大网络（教师网络）学到的知识传递给小网络（学生网络），使小网络获得接近大网络的精度。知识蒸馏在图像分类任务上获得不错的效果，但在目标检测上的研究较少，且有待提高。当前目标检测中主要基于特征提取层进行知识蒸馏，该类方法存在两个问题，第一，没有对教师网络传递知识的重要程度进行度量，第二，仅对特征提取层进行蒸馏，教师网络的知识未充分传递给学生网络。针对第一个问题，通过引入信息图作为蒸馏的监督信号，强化了学生网络对教师网络重点知识的学习；针对第二个问题，对特征提取层和特征融合层的输出同时进行蒸馏，使学生网络更充分地学习教师网络传递的知识。实验结果表明，以YOLOv3为检测模型，在不改变学生网络结构的基础上，平均类别精度（mAP）提升9.3个百分点。

关键词: 知识蒸馏, 模型压缩, 目标检测, YOLOv3

LI Jiangnan, WU Xing, LIU Jingsheng, WANG Honggang. Research of YOLOv3 Based on Knowledge Distillation[J]. Computer Engineering and Applications, 2022, 58(17): 174-180.

李姜楠, 伍星, 刘竞升, 王洪刚. 基于知识蒸馏的YOLOv3算法研究[J]. 计算机工程与应用, 2022, 58(17): 174-180.

References

[1] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：580-587.
[2] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recogntion，2015：1440-1448.
[3] REN S，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems，2015：91-99.
[4] LIN T Y，DOLLAR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[5] CAI Z W，VASCONCELOS N.Cascade R-CNN：delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recogntion，2018：6154-6162.
[6] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：779-788.
[7] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：6517-6525.
[8] REDMON J，FARHADI A.YOLOv3：an incremental improvement[EB/OL].（2018-04-08）[2019-09-19].https：//arxiv.org/abs/1804.02767.
[9] 邹承明，薛榕刚.融合GIoU和Focal loss的YOLOv3目标检测算法[J].计算机工程与应用，2020，56（24）：214-222.
ZOU C M，XUE R G.Improved YOLOv3 object detection algorithm：combining GIoU and Focal loss[J].Computer Engineering and Applications，2020，56（24）：214-222.
[10] BOCHKOVSKIY A，WANG C Y，LIAO M H Y.YOLOv4：optimal speed and accuracy of object detection[EB/OL].（2020-04-23）[2020-12-20].https：//arxiv.org/abs/2004.10934.
[11] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//Proceedings of European Conference on Computer Vision，2016：21-37.
[12] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of IEEE International Conference on Computer Vision，2017：2980-2988.
[13] HINTON G，VINYALS O，DEAN J.Distilling the knowledge in a neural network[C]//Proceedings of the IEEE International Conference and Workshop on Neural Information Processing Systems，2015：38-39.
[14] CHEN G，CHOI W，YU X，et al.Learning efficient object detection models with knowledge distillation[C]//Proceedings of the IEEE International Conference and Workshop on Neural Information Processing Systems，2017.
[15] WANG T，YUAN L，ZHANG X P，et al.Distilling object detectors with fine-grained feature imitation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2019：4933-4942.
[16] MEHTA R，OZTURK C.Object detection at 200 frames per second[C]//Proceedings of European Conference on Computer Vision，2018.
[17] 管文杰.基于注意力机制与知识蒸馏的目标细分类与检测[D].南京：南京大学，2019.
GUAN W J.Object classification and detection based on attention mechanism and knowledge distillation[D].Nanjing：Nanjing University，2019.
[18] 温静.基于深度学习的目标检测及其在智能车驾驶环境理解的应用研究[D].北京：北京交通大学，2019.
WEN J.Object detection based on deep learning and application in understanding intelligent vehicle driving environments[D].Beijing：Beijing Jiaotong University，2019.
[19] EVERINGHAM M，ESLAMI S M.The PASCAL visual object classes challenge：a retrospective[J].International Journal of Computer Vision，2015，111（1）：98-136.