改进YOLOv5s的复杂交通场景路侧目标检测算法

doi:10.3778/j.issn.1002-8331.2304-0090

摘要/Abstract

摘要： 针对传统路侧目标检测模型存在的对于行人、非机动车、受遮挡车辆等小目标检测精度低以及模型体积过大的问题，提出了一种基于改进YOLOv5s的路侧目标检测模型。使用EIoU Loss替换原始的CIoU Loss作为目标边界框的回归损失函数，在加快预测框回归损失函数收敛速度的同时提升了预测框的回归预测精度；使用轻量级的通用上采样算子CARAFE替换原始的最近邻插值上采样模块，减少了上采样过程中特征信息的损失；在原始的三尺度检测层的基础上新添加一层检测尺度更小的小目标检测分支，并提出了一种高效的解耦预测头对不同尺度的检测层进行解耦，进一步提升了模型对于小目标的检测能力；对改进后的模型进行通道剪枝，剪除对于检测效果影响不大的冗余通道，降低模型体积，使得模型更加适用于资源受限条件下的路侧目标检测任务。在路侧目标检测数据集DAIR-V2X-I上的实验结果表明，相较于原始YOLOv5s算法，改进后的算法在模型体积减小5.7?MB的基础上，mAP50、mAP50：95分别提高了2.5个百分点和3.8个百分点，达到了90.3%、67.7%，检测速度也达到了89?FPS。与其他主流的目标检测算法在检测精度、模型体积以及检测速度上相比有一定的优势，改进后的算法适用于复杂交通场景下的路侧目标检测任务。

关键词: 目标检测, 路侧感知, YOLOv5, EIoU Loss, CARAFE, 解耦预测头, 通道剪枝

Abstract: To address the problem of low detection accuracy for small targets, such as pedestrians, non-motorized vehicles and obstructed vehicles, as well as the issue of large model size in traditional roadside target detection models, a roadside target detection model based on improved YOLOv5s is proposed. Firstly, the EIoU Loss is used to replace the original CIoU Loss as the regression loss function for target bounding box, which speeds up the convergence of the bounding box regression loss function while improving the regression prediction accuracy of predicted box. Secondly, a lightweight and universal upsampling operator called CARAFE is used to replace the original nearest neighbor interpolation upsampling module, reducing the loss of feature information during upsampling. Then, a small target detection branch with a smaller detection scale is added on the basis of the original three-scale detection layer, and an efficient decoupling prediction head is proposed to decouple the detection layers of different scales, further improving the model’s detection capability for small targets. Finally, channel pruning is performed on the improved model to remove redundant channels that have little impact on detection performance, reducing the model’s size, making it more suitable for roadside target detection tasks under resource-constrained conditions. The experimental results on the roadside target detection dataset DAIR-V2X-I demonstrate that compared with the original YOLOv5s algorithm, the improved algorithm achieves a reduction in model size of 5.7 MB while increasing mAP50 and mAP50：95 by 2.5 percentage points and 3.8 percentage points, respectively, reaching 90.3% and 67.7%. The detection speed also reaches 89 FPS. Compared with other mainstream object detection algorithms, the improved model has certain advantages in detection accuracy, model size, and detection speed, making it suitable for roadside target detection tasks in complex traffic scenes.

Key words: object detection, roadside perception, YOLOv5, EIoU Loss, CARAFE, decoupling prediction head, channel pruning

杨睿宁, 惠飞, 金鑫, 侯瑞宇. 改进YOLOv5s的复杂交通场景路侧目标检测算法[J]. 计算机工程与应用, 2023, 59(16): 159-169.

YANG Ruining, HUI Fei, JIN Xin, HOU Ruiyu. Roadside Target Detection Algorithm for Complex Traffic Scene Based on Improved YOLOv5s[J]. Computer Engineering and Applications, 2023, 59(16): 159-169.

参考文献

[1] VIOLA P，JONES M J.Robust real-time face detection[J].International Journal of Computer Vision，2004，57（2）：137-154.
[2] DALAL N，TRIGGS B.Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition，2005：886-893.
[3] FELZENSZWALB P F，GIRSHICK R B，MCALLESTER D，et al.Object detection with discriminatively trained part-based models[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2010，32（9）：1627-1645.
[4] TSAI D M，LAI S C.Independent component analysis-based background subtraction for indoor surveillance[J].IEEE Transactions on Image Processing，2009，18（1）：158-167.
[5] DAR-SHYANG L.Effective Gaussian mixture learning for video background subtraction[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2005，27（5）：827-832.
[6] ZOU Z，SHI Z，GUO Y，et al.Object detection in 20 years：a survey[J].arXiv：1905.05055，2019.
[7] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition，2014：580-587.
[8] GIRSHICK R.Fast R-CNN[C]//Proceedings of the 2015 IEEE/CVF International Conference on Computer Vision，2015：1440-1448.
[9] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[10] WEI L，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision，2016：21-37.
[11] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：779-788.
[12] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：6517-6525.
[13] REDMON J，FARHADI A.YOLOv3：an incremental improvement[J].arXiv：1804.02767，2018.
[14] BOCHKOVSKIY A，WANG C Y，LIAO H Y M.YOLOv4：optimal speed and accuracy of object detection[J].arXiv：2004.10934，2020.
[15] LIN T Y，DOLLAR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：936-944.
[16] 郭磊，王邱龙，薛伟，等.基于改进YOLOv5的小目标检测算法[J].电子科技大学学报，2022，51（2）：251-258.
GUO L，WANG Q L，XUE W，et al.A small object detection algorithm based on improved YOLOv5[J].Journal of University of Electronic Science and Technology of China，2022，51（2）：251-258.
[17] 徐光达，毛国君.多层级特征融合的无人机航拍图像目标检测[J].计算机科学与探索，2023，17（3）：635-645.
XU G D，MAO G J.Aerial image object detection of UAV based on multi-level feature fusion[J].Journal of Frontiers of Computer Science and Technology，2023，17（3）：635-645.
[18] 王鹏飞，黄汉明，王梦琪.改进YOLOv5的复杂道路目标检测算法[J].计算机工程与应用，2022，58（17）：81-92.
WANG P F，HUANG H M，WANG M Q.Complex road target detection algorithm based on improved YOLOv5[J].Computer Engineering and Applications，2022，58（17）：81-92.
[19] 李永上，马荣贵，张美月.改进YOLOv5s+DeepSORT的监控视频车流量统计[J].计算机工程与应用，2022，58（5）：271-279.
LI Y S，MA R G，ZHANG M Y.Traffic monitoring video vehicle volume statistics method based on improved YOLOv5s+DeepSORT[J].Computer Engineering and Applications，2022，58（5）：271-279.
[20] 窦允冲，侯进，曾雷鸣，等.基于反馈机制与空洞卷积的道路小目标检测网络[J].计算机工程，2023，49（1）：287-294.
DOU Y C，HOU J，ZENG L M，et al.Road small target detection network based on feedback mechanism and dilated convolution[J].Computer Engineering，2023，49（1）：287-294.
[21] 李昂，孙士杰，张朝阳，等.改进YOLOv5s的轨道障碍物检测模型轻量化研究[J].计算机工程与应用，2023，59（4）：197-207.
LI A，SUN S J，ZHANG Z Y，et al.Research on lightweight of improved YOLOv5s track obstacle detection model[J].Computer Engineering and Applications，2023，59（4）：197-207.
[22] WANG J Q，CHEN K，XU R，et al.CARAFE：content-aware reassembly of features[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision，2019：3007-3016.
[23] LIU S，QI L，QIN H，et al.Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：8759-8768.
[24] LIN T Y，MAIRE M，BELONGIE S，et al.Microsoft COCO：common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision，2014：740-755.
[25] EVERINGHAM M，VAN GOOL L，WILLIAMS C K I，et al.The Pascal visual object classes（VOC） challenge[J].International Journal of Computer Vision，2010，88（2）：303-338.
[26] REZATOFIGHI H，TSOI N，GWAK J，et al.Generalized intersection over union：a metric and a loss for bounding box regression[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：658-666.
[27] WU Y，CHEN Y，YUAN L，et al.Rethinking classification and localization for object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：10183-10192.
[28] GE Z，LIU S，WANG F，et al.YOLOX：exceeding YOLO series in 2021[J].arXiv：2107.08430，2021.
[29] YU H，LUO Y，SHU M，et al.DAIR-V2X：a large-scale dataset for vehicle-infrastructure cooperative 3D object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2022：21329-21338.