Research on Imbalanced Training of Deep Learning Target Detection Model

doi:10.3778/j.issn.1002-8331.2009-0270

Abstract

Abstract: Target detection as one of computer vision tasks has become a hot issue. At present, target detection algorithms depends on deep learning emerge in endlessly, but in most cases, scholars only care about their model architecture and ignore its training process. The target detection network will have obvious imbalance problems during the training process, which will reduce the performance of model detection and fail to achieve the expected best effect. The imbalance problem mainly includes two levels, namely the feature maps level and the objective function level. In order to fully utilize the potential of the target detection model architecture and achieve a better training process, Balanced Feature Pyramid and Balanced L1 Loss modules?are?proposed?to?use, and added to the Faster R-CNN based on ResNet-50-FPN, and the purpose is to solve the imbalance between the feature map level and the objective function level in the training process of Faster R-CNN model. Through verification on the MSCOCO dataset, experimental results show that the balanced model can reach a result of 38.5% AP, which is 1.1 percentage points higher than original Faster R-CNN target detection model.

Key words: target detection, deep learning, imbalance problem, Faster R-CNN

摘要： 目标检测作为计算机视觉的任务之一已经成为研究热点问题。目前，基于深度学习的目标检测算法层出不穷，但大多数情况下学者只关心它们的模型架构，而忽视了其训练过程。目标检测网络在训练过程中会存在明显的不平衡问题，导致模型检测性能降低，不能达到预期的最佳效果。不平衡问题主要包括两个层次，分别是特征图层次和目标函数层次。为了能够充分发挥目标检测模型架构的潜力，实现更好的训练过程，提出利用Balanced Feature Pyramid和Balanced L1 Loss两个模块，同时将它们加入到基于ResNet-50-FPN的Faster R-CNN中，目的是解决Faster R-CNN模型在训练过程中存在的特征图层次和目标函数层次的不平衡问题。通过在MSCOCO数据集上验证，实验结果表明平衡后的模型可达到AP是38.5%的结果，比原Faster R-CNN目标检测模型提高了1.1个百分点。

关键词: 目标检测, 深度学习, 不平衡问题, Faster R-CNN

HE Yuzhe, HE Ning, ZHANG Ren, LIANG Yubo, LIU Xiaoxiao. Research on Imbalanced Training of Deep Learning Target Detection Model[J]. Computer Engineering and Applications, 2022, 58(5): 172-178.

贺宇哲, 何宁, 张人, 梁煜博, 刘晓晓. 面向深度学习目标检测模型训练不平衡研究[J]. 计算机工程与应用, 2022, 58(5): 172-178.

References

[1] 罗会兰，陈鸿坤.基于深度学习的目标检测研究综述[J].电子学报，2020（6）：1230-1239.
LUO H L，CHEN H K.Survey of object detection based on deep learning[J].Acta Electronica Sinica，2020（6）：1230-1239.
[2] REDMON J，FARHADI A.Yolov3：an incremental improvement[C]//Proceedings of the IEEE International Conference on Computer Vision，2018.
[3] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems，2015：91-99.
[4] HE K M，GKIOXARI G，DOLLAR P，et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision，2017.
[5] 刘昊.目标检测中不平衡问题综述[J].智库时代，2020（10）：256-257.
LIU H.Overview of imbalance in target detection[J].Think Tank Era，2020（10）：256-257.
[6] LIN T Y，DOLLAR P，GIRSHICK B，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017.
[7] PANG J M，CHEN K，SHI J P，et al.Libra R-CNN：towards balanced learning for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2019.
[8] KENDALL A，GAL Y，CIPOLLA R.Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of the IEEE International Conference on Computer Vision，2018：7482-7491.
[9] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision，2014.
[10] HE K M，ZHANG X Y，REN S Q，et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//European Conference on Computer Vision，2014.
[11] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision，2015.
[12] IGARASHI Y，KOMATSU H，KOBAYASHI S.Tohoku at SemEval-2016 Task 6：feature-based model versus convolutional neural network for stance detection[C]//Proceedings of SemEval，2016：401-407.
[13] LIU S，QI L，QIN H，et al.Path aggregation network for instance segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision，2018.
[14] KONG T，SUN F，HUANG W，et al.Deep feature pyramid reconfiguration for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2018.
[15] KENDALL A，GAL Y，CIPOLLA R.Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of the IEEE International Conference on Computer Vision，2018.
[16] WANG X，GIRSHICK R，GUPTA A，et al.Non-local neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision，2018.
[17] LIN T，MAIRE M，BELONGIE S，et al.Microsoft coco：common objects in context[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2015，37（9）：1904-1976.
[18] CHENK M，COFER?E M，ZHOU?J，et al.Selene：a pytorch-based deep learning library for sequence data[J].Nature Methods：Techniques for Life Scientists and Chemists，201s9，16（4）：315-318.
[19] HE K M，ZHANG X Y，REN S Q，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE International Conference on Computer Vision，2016.
[20] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2018，42（2）：318-327.
[21] WANGJ Q，CHEN K，YANG S，et al.Region proposal by guided anchoring[C]//Proceedings of the IEEE International Conference on Computer Vision，2019.
[22] REZATOFIGHI H，?TSOI N，?GWAK J Y，?et al.Generalized intersection over union：ametric and a loss for bounding box regression[C]//Proceedings of the IEEE International Conference on Computer Vision，2019.