Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (14): 134-141.DOI: 10.3778/j.issn.1002-8331.2101-0430

Previous Articles     Next Articles

Object Detection Method Based on Multi-scale Feature Fusion for Driving Scene

HUANG Tongyu, HU Binjie, ZHU Tingting, HUANG Zhewen   

  1. 1.School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China
    2.Faculty of?Mega?Data?and Computer Science, Guangdong Baiyun University, Guangzhou 510450, China
    3.Department of Technology, Guangzhou Shengfa Technology Service Co., Ltd., Guangzhou 510308, China
  • Online:2021-07-15 Published:2021-07-14



  1. 1.华南理工大学 电子与信息学院,广州 510640
    2.广东白云学院 大数据与计算机学院,广州 510450
    3.广州市生发科技服务有限公司 技术部,广州 510308


Aiming at the problem of low detection accuracy of convolutional neural network model for object detection in driving vision, a multi-scale feature fusion object detection method based on improved RefineDet is proposed. Firstly, the LFIP(Light-weight Featured Image Pyramid) network is embedded in the RefineDet, and the multi-scale feature map generated by LFIP network is integrated with the main feature map output from ARM(Anchor Refinement Module) in the RefineDet, which improves the output effect of anchors preliminary classification and regression in the convolutional layer, and provides refined anchors frame for ODM(Object Detection Module) for further regression and multi-class prediction. Secondly, after the ODM in the RefineDet, a multi-branch structure RFB(Receptive Field Block) is embedded to obtain receptive fields of different scale in the detection task to improve the features extracted from the backbone network. Thirdly, the activation function in the model is replaced by the nonlinear activation function PReLU(Parametric Rectified Linear Unit) with learnable parameters to speed up the convergence of the model. Then, the Bounding box loss function of RefineDet is replaced by the Repulsion Loss function to narrow the gap between a proposal and its designated target and increase the distance between the proposal and the surrounding non-target objects. Finally, an object detection dataset is constructed with 48 260 images in driving vision, including 38 608 as training set and 9 652 as test set, which are verified on mainstream GPU hardware platform. The mAP of this method is 85.59%, which is better than RefineDet and other improved algorithms;the FPS is 41.7 frame/s, which meets the application requirements of driving scene object detection. Experimental results show that the proposed method can improve the accuracy of object detection in driving vision, and solve the problems of occlusion object detection and small object detectionin driving vision to a certain extent.

Key words: deep learning, convolutional neural network, object detection, RefineDet algorithm, Receptive Field Block(RFB), Light-weight Featured Image Pyramid(LFIP), Parametric Rectified Linear Unit(PReLU), loss function, occlusion object


针对驾驶场景中目标检测卷积神经网络模型检测精度较低的问题,提出一种基于改进RefineDet网络结构的多尺度特征融合目标检测方法。在RefineDet网络结构中嵌入LFIP(Light-weight Featurized Image Pyramid,轻量级特征化的图像金字塔)网络,将LFIP网络生成的多尺度特征图与RefineDet中的ARM(Anchor Refinement Module,锚点框修正模块)输出的主特征图相融合,提升特征层中锚点框初步分类和回归的输出效果,为ODM(Object Detection Module,目标检测模块)模块提供修正的锚点框以便于进一步回归和多类别预测;在RefineDet网络结构中的ODM之后嵌入多分支结构RFB(Receptive Field Block,感受野模块),在检测任务中获得不同尺度的感受野以改善主干网络中提取的特征。将模型中的激活函数替换为带有可学习参数的非线性激活函数PReLU(Parametric Rectified Linear Unit,参数化修正线性单元),加快网络模型的收敛速度;将RefineDet的边界框回归损失函数替换为排斥力损失函数Repulsion Loss,使目标检测中的某预测框更靠近其对应的目标框,并使该预测框远离附近的目标框及预测框,可以提升遮挡情况下目标检测的精度;构建驾驶视觉下的目标检测数据集,共计48 260张,其中38 608张作为训练集,9 652张作为测试集,并在主流的GPU硬件平台进行验证。该方法的mAP为85.59%,优于RefineDet及其他改进算法;FPS为41.7 frame/s,满足驾驶场景目标检测的应用要求。实验结果表明,该方法在检测速度略微下降的情况,能够较好地提升驾驶视觉下的目标检测的精确度,并能够一定程度上解决驾驶视觉下的遮挡目标检测和小目标检测的问题。

关键词: 深度学习, 卷积神经网络, 目标检测, RefineDet算法, 感受野模块(RFB), 轻量级特征化的图像金字塔(LFIP), 参数化修正线性单元(PReLU), 损失函数, 遮挡目标