Research on Road Target Detection Algorithm Based on YOLOv5

doi:10.3778/j.issn.1002-8331.2206-0316

Abstract

Abstract: In order to improve the accuracy of road target detection, based on the YOLOv5 network model, this paper introduces a bottom-up PANet network structure to enhance feature fusion, adopts a target attention mechanism with direction awareness and location information to enhance the perception of the target position, and a YOLO detection head is added to enhance the learning ability of small targets. The improved CIOU（ICIOU） target regression loss function is adopted, the learning ability of the entire model for image features and the target detection accuracy are significantly improved. Experimental results show that the mAP of this model under the Huawei SODA10M dataset has reached 68.2%, which is 15.4 percentage points higher than the original YOLOv5 network mAP, and the detection accuracy has been significantly improved. On this basis, the paper explores the influence of image size on detection time and accuracy. The results show that appropriately increasing the image input size can significantly improve mAP（3.8?percentage points） on the premise that the detection speed is not significantly reduced（23.3 percentage points）.

Key words: deep learning, object detection, attention mechanism, intersection over union, PANet network structure

摘要： 为提高道路目标检测精度，基于YOLOv5网络模型，引入自底向上的PANet网络结构，以增强特征融合；采用具有方向感知与位置信息的目标注意力机制，以增强对目标位置的感知能力；增加了一个YOLO检测头，以增强对小目标的学习能力。采用改进的CIOU（ICIOU）目标回归损失函数，使得整个模型对图像特征的学习能力和目标检测精度显著提升。实验结果表明，该模型在华为SODA10M数据集下的mAP达到了68.2%，相比原YOLOv5网络mAP提升了15.4个百分点，检测精度得到了明显提升。在此基础上，对图像尺寸对检测时间和精度的影响进行探索，结果表明适当增大图像输入尺寸，可以在检测速度下降不大（23.3个百分点）的前提下，使得mAP明显提升（3.8个百分点）。

关键词: 深度学习, 目标检测, 注意力机制, 交并比, PANet网络结构

WANG Peng, WANG Yulin, JIAO Bowen, WANG Hongchang, YU Yixuan. Research on Road Target Detection Algorithm Based on YOLOv5[J]. Computer Engineering and Applications, 2023, 59(1): 117-125.

王鹏, 王玉林, 焦博文, 王洪昌, 于奕轩. 基于YOLOv5的道路目标检测算法研究[J]. 计算机工程与应用, 2023, 59(1): 117-125.

References

[1] 董文轩，梁宏涛，刘国柱，等.深度卷积应用于目标检测算法综述[J].计算机科学与探索，2022，16（5）：1025-1042.
DONG W X，LIANG H T，LIU G Z，et al.Review of deep convolution applied to target detection algorithms[J].Journal of Frontiers of Computer Science and Technology，2022，16（5）：1025-1042.
[2] WOJEK C，WALK S，SCHIELE B.Multi-cue onboard pedestrian detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2009：794-801.
[3] ZHANG S S，BENENSON R，SCHIELE B.CityPersons：A diverse dataset for pedestrian detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：4457-4465.
[4] SMOLYANSKIY N，KAMENEV A，BIRCHFIELD S.On the importance of stereo for accurate depth estimation：An efficient semi-supervised deep neural network approach[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops（CVPRW），2018：1120-1128.
[5] YU F，CHEN H F，WANG X，et al.BDD100K：A diverse driving dataset for heterogeneous multitask learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：2633-2642.
[6] HOUSTON J，ZUIDHOF G，BERGAMINI L，et al.One thousand and one hours：Self-driving motion prediction dataset[J].arXiv：2006.14480，2020.
[7] HUANG X Y，WANG P，CHENG X J，et al.The ApolloScape open dataset for autonomous driving and its application[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42（10）：2702-2719.
[8] HAN J，LIANG X，XU H，et al.SODA10M：A large-scale 2D self/semi?supervised object detection dataset for autonomous driving[J].arXiv：2106.11118，2021.
[9] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：Unified，real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：779-788.
[10] LIU W，ANGUELOV D，ERHAN D，et al.SSD：Single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision，2016：21-37.
[11] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：580-587.
[12] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[13] ZHU X K，LYU S C，WANG X，et al.TPH-YOLOv5：Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops（ICCVW），2021：2778-2788.
[14] REDMON J，FARHADI A.YOLO9000：Better，faster，stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：6517-6525.
[15] REDMON J，FARHADI A.YOLOv3：An incremental improvement[J].arXiv：1804.02767，2018.
[16] BOCHKOVSKIY A，WANG C Y，LIAO H Y M.YOLOv4：Optimal speed and accuracy of object detection[J].arXiv：2004.10934，2020.
[17] LIU S，QI L，QIN H F，et al.Path aggregation network for instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：8759-8768.
[18] WANG J，CHEN Y，GAO M，et al.Improved YOLOv5 network for real-time multi-scale traffic sign detection[J].arXiv：2112.08782，2021.
[19] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[20] WOO S，PARK J，LEE J，et al.CBAM：Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision，2018.
[21] HOU Q B，ZHOU D Q，FENG J S.Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：13708-13717.
[22] YU J，JIANG Y，WANG Z，et al.Unitbox：An advanced object detection network[C]//Proceedings of the 24th ACM International Conference on Multimedia，2016：516-520.
[23] REZATOFIGHI H，TSOI N，GWAK J，et al.Generalized intersection over union：A metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：658-666.
[24] ZHENG Z，WANG P，LIU W，et al.Distance-IoU loss：Faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2020：12993-13000.
[25] ZHANG Y F，REN W，ZHANG Z，et al.Focal and efficient IOU loss for accurate bounding box regression[J].arXiv：2101.08158，2021.