面向驾驶场景的多尺度特征融合目标检测方法

doi:10.3778/j.issn.1002-8331.2101-0430

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (14): 134-141.DOI: 10.3778/j.issn.1002-8331.2101-0430

面向驾驶场景的多尺度特征融合目标检测方法

黄仝宇，胡斌杰，朱婷婷，黄哲文

1.华南理工大学电子与信息学院，广州 510640
2.广东白云学院大数据与计算机学院，广州 510450
3.广州市生发科技服务有限公司技术部，广州 510308

出版日期:2021-07-15 发布日期:2021-07-14

Object Detection Method Based on Multi-scale Feature Fusion for Driving Scene

HUANG Tongyu, HU Binjie, ZHU Tingting, HUANG Zhewen

1.School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China
2.Faculty of?Mega?Data?and Computer Science, Guangdong Baiyun University, Guangzhou 510450, China
3.Department of Technology, Guangzhou Shengfa Technology Service Co., Ltd., Guangzhou 510308, China

Online:2021-07-15 Published:2021-07-14

摘要/Abstract

摘要：

针对驾驶场景中目标检测卷积神经网络模型检测精度较低的问题，提出一种基于改进RefineDet网络结构的多尺度特征融合目标检测方法。在RefineDet网络结构中嵌入LFIP（Light-weight Featurized Image Pyramid，轻量级特征化的图像金字塔）网络，将LFIP网络生成的多尺度特征图与RefineDet中的ARM（Anchor Refinement Module，锚点框修正模块）输出的主特征图相融合，提升特征层中锚点框初步分类和回归的输出效果，为ODM（Object Detection Module，目标检测模块）模块提供修正的锚点框以便于进一步回归和多类别预测；在RefineDet网络结构中的ODM之后嵌入多分支结构RFB（Receptive Field Block，感受野模块），在检测任务中获得不同尺度的感受野以改善主干网络中提取的特征。将模型中的激活函数替换为带有可学习参数的非线性激活函数PReLU（Parametric Rectified Linear Unit，参数化修正线性单元），加快网络模型的收敛速度；将RefineDet的边界框回归损失函数替换为排斥力损失函数Repulsion Loss，使目标检测中的某预测框更靠近其对应的目标框，并使该预测框远离附近的目标框及预测框，可以提升遮挡情况下目标检测的精度；构建驾驶视觉下的目标检测数据集，共计48 260张，其中38 608张作为训练集，9 652张作为测试集，并在主流的GPU硬件平台进行验证。该方法的mAP为85.59%，优于RefineDet及其他改进算法；FPS为41.7 frame/s，满足驾驶场景目标检测的应用要求。实验结果表明，该方法在检测速度略微下降的情况，能够较好地提升驾驶视觉下的目标检测的精确度，并能够一定程度上解决驾驶视觉下的遮挡目标检测和小目标检测的问题。

关键词: 深度学习, 卷积神经网络, 目标检测, RefineDet算法, 感受野模块（RFB）, 轻量级特征化的图像金字塔（LFIP）, 参数化修正线性单元（PReLU）, 损失函数, 遮挡目标

Abstract:

Aiming at the problem of low detection accuracy of convolutional neural network model for object detection in driving vision, a multi-scale feature fusion object detection method based on improved RefineDet is proposed. Firstly, the LFIP（Light-weight Featured Image Pyramid） network is embedded in the RefineDet, and the multi-scale feature map generated by LFIP network is integrated with the main feature map output from ARM（Anchor Refinement Module） in the RefineDet, which improves the output effect of anchors preliminary classification and regression in the convolutional layer, and provides refined anchors frame for ODM（Object Detection Module） for further regression and multi-class prediction. Secondly, after the ODM in the RefineDet, a multi-branch structure RFB（Receptive Field Block） is embedded to obtain receptive fields of different scale in the detection task to improve the features extracted from the backbone network. Thirdly, the activation function in the model is replaced by the nonlinear activation function PReLU（Parametric Rectified Linear Unit） with learnable parameters to speed up the convergence of the model. Then, the Bounding box loss function of RefineDet is replaced by the Repulsion Loss function to narrow the gap between a proposal and its designated target and increase the distance between the proposal and the surrounding non-target objects. Finally, an object detection dataset is constructed with 48 260 images in driving vision, including 38 608 as training set and 9 652 as test set, which are verified on mainstream GPU hardware platform. The mAP of this method is 85.59%, which is better than RefineDet and other improved algorithms;the FPS is 41.7 frame/s, which meets the application requirements of driving scene object detection. Experimental results show that the proposed method can improve the accuracy of object detection in driving vision, and solve the problems of occlusion object detection and small object detectionin driving vision to a certain extent.

Key words: deep learning, convolutional neural network, object detection, RefineDet algorithm, Receptive Field Block（RFB）, Light-weight Featured Image Pyramid（LFIP）, Parametric Rectified Linear Unit（PReLU）, loss function, occlusion object

黄仝宇，胡斌杰，朱婷婷，黄哲文. 面向驾驶场景的多尺度特征融合目标检测方法[J]. 计算机工程与应用, 2021, 57(14): 134-141.

HUANG Tongyu, HU Binjie, ZHU Tingting, HUANG Zhewen. Object Detection Method Based on Multi-scale Feature Fusion for Driving Scene[J]. Computer Engineering and Applications, 2021, 57(14): 134-141.

[1]	武文杰，宋文爱，高雪梅，杨吉江，王青，黄丽萍，雷毅. 基于X线的成人OSA计算机辅助诊断综述[J]. 计算机工程与应用, 2021, 57(9): 1-8.
[2]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[3]	李晓筱，胡晓光，王梓强，杜卓群. 基于深度学习的实例分割研究进展[J]. 计算机工程与应用, 2021, 57(9): 60-67.
[4]	牟清萍，张莹，张东波，王新杰，杨知桥. 目标丢失判别机制的视觉跟踪算法及应用研究[J]. 计算机工程与应用, 2021, 57(9): 140-147.
[5]	包志强，邢瑜，吕少卿，黄琼丹. 改进YOLO V2的6D目标姿态估计算法[J]. 计算机工程与应用, 2021, 57(9): 148-153.
[6]	黄冬宜，杨兵，吴子豪，匡佳一，颜泽明. 用于全市蜂窝流量预测的时空全连接卷积网络[J]. 计算机工程与应用, 2021, 57(9): 168-175.
[7]	赵志焱，杨华，胡志伟，宇海萍. 基于TACNN的玉露香梨叶虫害识别[J]. 计算机工程与应用, 2021, 57(9): 176-181.
[8]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.
[9]	张成，戴俊峰，熊闻心. 融合LeNet-5改进的扫描文档手写日期识别[J]. 计算机工程与应用, 2021, 57(9): 207-211.
[10]	张朕通，单玉刚，袁杰. 联合多尺度和注意力机制的遥感影像检测[J]. 计算机工程与应用, 2021, 57(9): 212-216.
[11]	王博，宋丹，王洪玉. 无人机自主巡检系统的关键技术研究[J]. 计算机工程与应用, 2021, 57(9): 255-263.
[12]	麻哲旭，杨峰，乔旭. 铁路路基病害智能检测方法[J]. 计算机工程与应用, 2021, 57(9): 272-278.
[13]	徐少杰，曹雏清，王永娟. 视觉SLAM在室内动态场景中的应用研究[J]. 计算机工程与应用, 2021, 57(8): 175-179.
[14]	李明山，韩清鹏，张天宇，王道累. 改进SSD的安全帽检测方法[J]. 计算机工程与应用, 2021, 57(8): 192-197.
[15]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.

面向驾驶场景的多尺度特征融合目标检测方法

Object Detection Method Based on Multi-scale Feature Fusion for Driving Scene

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics