面向复杂交通场景的道路目标检测方法

doi:10.3778/j.issn.1002-8331.2212-0093

摘要/Abstract

摘要： 针对复杂交通场景下小目标检测精度低，容易出现误检和漏检的问题，提出一种基于改进YOLOv5s的道路目标检测算法YOLOv5s-MRS。提出基于反馈机制的特征提取网络（RFP-PAN），增加浅层特征层与反馈连接并设计IASPP模块，充分融合不同尺度的特征信息，提升网络的特征融合能力；提出级联注意力机制（SECA），在通道和空间维度上聚焦重要特征，让算法关注更加有用的信息；利用Ghost模块的轻量化优势，降低算法的参数量、计算量和模型占用空间。实验结果表明，YOLOv5s-MRS算法在KITTI数据集和VisDrone2021 DET数据集上的检测精度分别达到了93.4%和40.8%，相比原始算法分别提高了1.6和8.6个百分点，模型大小为12.9 MB，在保证实时性的同时具有良好的检测精度，在一定程度上解决了小目标的漏检和误检问题。

关键词: YOLOv5s, 递归金字塔, 注意力机制, GhostNet

Abstract: Aiming at the problem of low detection accuracy of small-scale targets in complex traffic scenes, and prone to false detection and missed detection, a target detection algorithm YOLOv5s-MRS based on YOLOv5s is proposed. Firstly, a feature extraction network（RFP-PAN） based on feedback mechanism is proposed to increase the shallow feature layer with feedback connection and design the IASPP module to fully fuse the feature information of different scales. Secondly, the cascaded attention mechanism（SECA） is proposed to focus on important features in channel and spatial dimensions and make the model to focus on more useful information. Finally, Ghost module is used to reduce the number of parameters, computation and model occupation space of the model. The experimental results show that the detection accuracy of YOLOv5s-MRS reaches 93.4% and 40.8% on KITTI dataset and VisDrone2021 DET dataset, respectively, which is 1.6 and 8.6 percentage points higher than that of the original algorithm and the model size is 12.9 MB. YOLOv5s-MRS has good detection accuracy while ensuring real-time, and solves the problem of missing and false detection of small targets to some extent.

Key words: YOLOv5s, recursive feature pyramid, attention mechanism, GhostNet

盛博莹, 侯进, 李嘉新, 党辉. 面向复杂交通场景的道路目标检测方法[J]. 计算机工程与应用, 2023, 59(15): 87-96.

SHENG Boying, HOU Jin, LI Jiaxin, DANG Hui. Road Object Detection Method for Complex Road Scenes[J]. Computer Engineering and Applications, 2023, 59(15): 87-96.

参考文献

[1] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：580-587.
[2] GIRSHICK R.Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1440-1448.
[3] REN S，HE K，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2016：39（6）：1137-1149.
[4] HE K，GKIOXARI G，DOLLáR P，et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2961-2969.
[5] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：779-788.
[6] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//IEEE Conference on Computer Vision & Pattern Recognition，2017：6517-6525.
[7] REDMON J，FARHADI A.Yolov3：an incremental improvement[J].arXiv：1804.02767，2018.
[8] BOCHKOVSKIY A，WANG C Y，LIAO H Y M.Yolov4：optimal speed and accuracy of object detection[J].arXiv：2004.10934，2020.
[9] LI C，LI L，JIANG H，et al.YOLOv6：a single-stage object detection framework for industrial applications[J].arXiv：2209.02976，2022.
[10] WANG C Y，BOCHKOVSKIY A，LIAO H Y M.YOLOv7：trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J].arXiv：2207.02696，2022.
[11] GE Z，LIU S，WANG F，et al.Yolox：exceeding yolo series in 2021[J].arXiv：2107：08430，2021.
[12] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision.Amsterdam：Springer，2016：21-37.
[13] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[14] 王鹏飞，黄汉明，王梦琪.改进YOLOv5的复杂道路目标检测算法[J].计算机工程与应用，2022，58（17）：81-92.
WANG P F，HUANG H M，WANG M Q.Complex road target detection algorithm based on improved YOLOv5[J].Computer Engineering and Applications，2022，58（17）：81-92.
[15] WOO S，PARK J，LEE J Y，et al.CBAM：convolution block attention module[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：3-19.
[16] 冉险生，苏山杰，陈俊豪，等.自适应特征融合的复杂道路场景目标检测算法[J/OL].计算机工程与应用：1-14[2023-01-18].http：//kns.cnki.net/kcms/detail/11.2127.TP.20221014.
1639.002.html.
RAN X S，SU S J，CHEN J H，et al.Object detection algorithm for complex road scenes based on adaptive feature fusion[J/OL].Computer Engineering and Applications：1-14[2023-01-18].http：//kns.cnki.net/kcms/detail/11.2127.TP.20221014.1639.002.html.
[17] 宋谱怡，陈红，苟浩波.改进YOLOv5s的无人机目标检测算法[J].计算机工程与应用，2023，59（1）：108-116.
SONG P Y，CHEN H，GOU H B.Improved UAV object detection algorithm for YOLOv5s[J].Computer Engineering and Applications，2023，59（1）：108-116.
[18] 代牮，赵旭，李连鹏，等.基于改进YOLOv5的复杂背景红外弱小目标检测算法[J].红外技术，2022，44（5）：504-512.
DAI J，ZHAO X，LI L P，et al.Improved YOLOv5-based infrared dim-small target detection under complex background[J].Infrared Technology，2022，44（5）：504-512.
[19] 李永上，马荣贵，张美月.改进YOLOv5s+DeepSORT的监控视频车流量统计[J].计算机工程与应用，2022，58（5）：271-279.
LI Y S，MA R G，ZHANG M Y.Traffic monitoring video volume statistics method based on improved YOLOv5s+DeepSORT[J].Computer Engineering and Applications，2022，58（5）：271-279.
[20] 汪雷，黄剑，段涛，等.基于气压肌动图和改进神经模糊推理系统的手势识别研究[J].自动化学报，2022，48（5）：1220-1233.
WANG L，HUANG J，DUAN T，et al.Research on gesture recognition based on pressure-based mechanomyogram and improved neural fuzzy inference system[J].Acta Automatica Siniva，2022，48（5）：1220-1233.
[21] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[22] LIU S，QI L，QIN H，et al.Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City：IEEE，2018：8759-8768.
[23] QIAO S，CHEN L C，YUILLE A.Detectors：detecting objects with recursive feature pyramid and switchable atrous convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：10213-10224.
[24] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.DeepLab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2018，40（4）：834-848.
[25] CHOLLET F.Xception：deep learning with depthwise separable convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2017：1251-1258.
[26] WANG Q，WU B，ZHU P，et al.ECA-Net：efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020.
[27] HAN K，WANG Y，TIAN Q，et al.Ghostnet：more features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：1580-1589.
[28] GEIGER A，LENZ P，STILLER C，et al.Vision meets robotics：the kitti dataset[J].The International Journal of Robotics Research，2013，32（11）：1231-1237.
[29] CAO Y，HE Z，WANG L，et al.VisDrone-DET2021：the vision meets drone object detection challenge results[C]//IEEE International Conference on Computer Vision，Montreal，Canada，2021：2847-2854.
[30] ZHANG W，CONG M Y，WANG L P.Algorithms for optical weak small targets detection and tracking：review[C]//Proceedings of the 2003 International Conference on Neural Networks and Signal Processing，2003：643-647.
[31] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[32] SELVARAJU R R，COGSWELL M，DAS A，et al.Grad-CAM：visual explanations from deep networks via gradient-based localization[C]//Proceeding of 2017 IEEE International Conference on Computer Vision（ICCV）.Venice，Italy：IEEE，2017：618-626.