计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (5): 172-178.DOI: 10.3778/j.issn.1002-8331.2009-0270

• 模式识别与人工智能 • 上一篇    下一篇

面向深度学习目标检测模型训练不平衡研究

贺宇哲,何宁,张人,梁煜博,刘晓晓   

  1. 1.北京联合大学 北京市信息服务工程重点实验室,北京 100101 
    2.北京联合大学 智慧城市学院,北京 100101
  • 出版日期:2022-03-01 发布日期:2022-03-01

Research on Imbalanced Training of Deep Learning Target Detection Model

HE Yuzhe, HE Ning, ZHANG Ren, LIANG Yubo, LIU Xiaoxiao   

  1. 1.Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
    2.College of Smart City, Beijing Union University, Beijing 100101, China
  • Online:2022-03-01 Published:2022-03-01

摘要: 目标检测作为计算机视觉的任务之一已经成为研究热点问题。目前,基于深度学习的目标检测算法层出不穷,但大多数情况下学者只关心它们的模型架构,而忽视了其训练过程。目标检测网络在训练过程中会存在明显的不平衡问题,导致模型检测性能降低,不能达到预期的最佳效果。不平衡问题主要包括两个层次,分别是特征图层次和目标函数层次。为了能够充分发挥目标检测模型架构的潜力,实现更好的训练过程,提出利用Balanced Feature Pyramid和Balanced L1 Loss两个模块,同时将它们加入到基于ResNet-50-FPN的Faster R-CNN中,目的是解决Faster R-CNN模型在训练过程中存在的特征图层次和目标函数层次的不平衡问题。通过在MSCOCO数据集上验证,实验结果表明平衡后的模型可达到AP是38.5%的结果,比原Faster R-CNN目标检测模型提高了1.1个百分点。

关键词: 目标检测, 深度学习, 不平衡问题, Faster R-CNN

Abstract: Target detection as one of computer vision tasks has become a hot issue. At present, target detection algorithms depends on deep learning emerge in endlessly, but in most cases, scholars only care about their model architecture and ignore its training process. The target detection network will have obvious imbalance problems during the training process, which will reduce the performance of model detection and fail to achieve the expected best effect. The imbalance problem mainly includes two levels, namely the feature maps level and the objective function level. In order to fully utilize the potential of the target detection model architecture and achieve a better training process, Balanced Feature Pyramid and Balanced L1 Loss modules?are?proposed?to?use, and added to the Faster R-CNN based on ResNet-50-FPN, and the purpose is to solve the imbalance between the feature map level and the objective function level in the training process of Faster R-CNN model. Through verification on the MSCOCO dataset, experimental results show that the balanced model can reach a result of 38.5% AP, which is 1.1 percentage points higher than original Faster R-CNN target detection model.

Key words: target detection, deep learning, imbalance problem, Faster R-CNN