Object Detection Algorithm for Complex Road Scenes Based on Adaptive Feature Fusion

doi:10.3778/j.issn.1002-8331.2207-0203

Abstract

Abstract: Aiming at the problems of low detection accuracy of densely occluded targets and small-scale targets in complex road scenes, and prone to miss detection and false detection, a target detection algorithm based on adaptive feature fusion for the YOLOv5 algorithm is proposed. The feature fusion factor is introduced to improve the adjacent scale feature fusion method, and the effective samples of each layer of the network are increased to improve the detection ability of medium and small scale objects. The shallow feature detection layer is added to improve the learning ability of the model small scale objects. The receptive field module is improved, allowing the model to adaptively select an effective receptive field to extract target feature information. Quality Focal Loss is introduced to improve the positioning accuracy of densely occluded targets and small-scale targets, and an attention mechanism is added to the feature fusion network to improve the algorithm’s effective use of feature information. The experimental results show that, compared with the original algorithm, the detection accuracy of the improved algorithm in the public data set BDD100K（10 classes）, Udacity, and the self-made data set CQTransport has been improved by 6.7, 4.9, and 7.9 percentage points respectively. It can improve the detection performance in complex road scenes, and to a certain extent solve the problem of missed detection and false detection of densely occluded targets and small-scale targets in the detection process.

Key words: target detection, complex road scenes, feature fusion factor, adaptive receptive field, multi-scale detection, YOLOv5

摘要： 针对复杂道路场景下密集遮挡目标、小尺度目标检测精度低，容易出现漏检和误检的问题，以YOLOv5算法为网络基础框架，提出了一种自适应特征融合的复杂道路场景目标检测算法。引入特征融合因子，改进相邻尺度特征融合方式，增加各层网络有效样本从而提升中小尺度目标检测能力；增加浅层特征检测层，提升模型小尺度目标的学习能力；改进感受野模块，允许模型自适应选择有效感受野提取目标特征信息；引入Quality Focal Loss改善密集遮挡目标，小尺度目标的定位精度，并在特征融合网络加入注意力机制，提高算法对特征信息的有效利用。实验结果表明，相比原始算法，改进算法在公开数据集BDD100K（10类）、Udacity及自制数据集CQTransport的检测精度分别提高了6.7、4.9、7.9个百分点；在基本不降低检测速度的前提下，能较好提升复杂道路场景下的检测性能，并在一定程度上解决了检测过程中密集遮挡目标、小尺度目标出现的漏检和误检问题。

关键词: 目标检测, 复杂道路场景, 特征融合因子, 自适应感受野, 多尺度检测, YOLOv5

RAN Xiansheng, SU Shanjie, CHEN Junhao, ZHANG Zhiyun. Object Detection Algorithm for Complex Road Scenes Based on Adaptive Feature Fusion[J]. Computer Engineering and Applications, 2023, 59(24): 216-226.

冉险生, 苏山杰, 陈俊豪, 张之云. 自适应特征融合的复杂道路场景目标检测算法[J]. 计算机工程与应用, 2023, 59(24): 216-226.

References

[1] 杨锦帆，王晓强，林浩，等.深度学习中的单阶段车辆检测算法综述[J].计算机工程与应用，2022，58（7）：55-67.
YANG J F，WANG X Q，LIN H，et al.Review of one-stage vehicle detection algorithms based on deep learning[J].Computer Engineering and Applications，2022，58（7）：55-67.
[2] 邱天衡，王玲，王鹏，等.基于改进YOLOv5的目标检测算法研究[J].计算机工程与应用，2022，58（13）：63-73.
QIU T H，WANG L，WANG P，et al.Research on object detection algorithm based on improved YOLOv5[J].Computer Engineering and Applications，2022，58（13）：63-73.
[3] AGARWAL S，TERRAIL J O D，JURIE F.Recent advances in object detection in the age of deep convolutional neural networks[J].arXiv：1809.03193，2018.
[4] HARIS M，GLOWACZ A.Road object detection：a comparative study of deep learning-based algorithms[J].Electronics，2021，10（16）：1932.
[5] TIAN Z，JIN Y，CAO H，et al.Real-time vehicle detection under complex road conditions[C]//2020 2nd International Conference on Industrial Artificial Intelligence（IAI），2020：1-4.
[6] 黄文涵，殷国栋，耿可可，等.基于扩张卷积特征自适应融合的复杂驾驶场景目标检测[J].东南大学学报（自然科学版），2021，51（6）：1076-1083.
HUANG W H，YIN G D，GENG K K，et al.Target detection in complex driving scenes based on adaptive fusion of dilated convolutional features[J].Journal of Southeast University（Natural Science Edition），2021，51（6）：1076-1083.
[7] ZHU D，XU G，ZHOU J，et al.Object detection in complex road scenarios：improved YOLOV4-tiny algorithm[C]//2021 2nd Information Communication Technologies Conference（ICTC），2021：75-80.
[8] LIN C T，HUANG S W，WU Y Y，et al.GAN-based day-to-night image style transfer for nighttime vehicle detection[J].IEEE Transactions on Intelligent Transportation Systems，2020，22（2）：951-963.
[9] DU S，ZHANG P，ZHANG B，et al.Weak and occluded vehicle detection in complex infrared environment based on improved YOLOv4[J].IEEE Access，2021，9：25671-25680.
[10] LIN C T，CHEN S P，SANTOSO P S，et al.Real-time single-stage vehicle detector optimized by multi-stage image-based online hard example mining[J].IEEE Transactions on Vehicular Technology，2019，69（2）：1505-1518.
[11] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[12] GUO C，FAN B，ZHANG Q，et al.Augfpn：improving multi-scale feature learning for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：12595-12604.
[13] TAN M，PANG R，LE Q V.Efficientdet：scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：10781-10790.
[14] CHEN Q，WANG Y，YANG T，et al.You only look one-level feature[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：13039-13048.
[15] LUO Y，CAO X，ZHANG J，et al.CE-FPN：enhancing channel information for object detection[J].arXiv：2103.10643，2021.
[16] GONG Y，YU X，DING Y，et al.Effective fusion factor in FPN for tiny object detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision，2021：1160-1168.
[17] LI Y，ZHOU S，CHEN H.Attention-based fusion factor in FPN for object detection[J].Applied Intelligence，2022，52（13）：15547-15556.
[18] YU F，KOLTUN V，FUNKHOUSER T.Dilated residual networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：472-480.
[19] LI Y，CHEN Y，WANG N，et al.Scale-aware trident networks for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：6054-6063.
[20] CHEN L C，PAPANDREOU G，SCHROFF F，et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv：1706.05587，2017.
[21] LIU S，HUANG D.Receptive field block net for accurate and fast object detection[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：385-400.
[22] BOCHKOVSKIY A，WANG C Y，LIAO H Y M.Yolov4：optimal speed and accuracy of object detection[J].arXiv：2004.10934，2020.
[23] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2881-2890.
[24] LIU S，HUANG D，WANG Y.Learning spatial fusion for single-shot object detection[J].arXiv：1911.09516，2019.
[25] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[26] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Advances in Neural Information Processing Systems，2017.
[27] ZHANG Q L，YANG Y B.Sa-net：shuffle attention for deep convolutional neural networks[C]//2021 IEEE International Conference on Acoustics，Speech and Signal Processing（ICASSP），2021：2235-2239.
[28] WANG J，CHEN Y，CHAKRABORTY R，et al.Orthogonal convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：11505-11515.
[29] ZHANG H，ZU K，LU J，et al.Epsanet：an efficient pyramid split attention block on convolutional neural network[J].arXiv：2105.14447，2021.
[30] LI X，HU X，YANG J.Spatial group-wise enhance：improving semantic feature learning in convolutional networks[J].arXiv：1905.09646，2019.
[31] WOO S，PARK J，LEE J Y，et al.Cbam：convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：3-19.
[32] HOU Q，ZHOU D，FENG J.Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：13713-13722.
[33] ZHANG Y F，REN W，ZHANG Z，et al.Focal and efficient IOU loss for accurate bounding box regression[J].arXiv：2101.08158，2021.
[34] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.
[35] PARK J，WOO S，LEE J Y，et al.Bam：bottleneck attention module[J].arXiv：1807.06514，2018.
[36] LI X，WANG W，WU L，et al.Generalized focal loss：learning qualified and distributed bounding boxes for dense object detection[C]//Advances in Neural Information Processing Systems，2020：21002-21012.
[37] YU F，XIAN W，CHEN Y，et al.Bdd100k：a diverse driving video database with scalable annotation tooling[J].arXiv：1805.04687，2018.
[38] BUYVAL A，GABDULLIN A，MUSTAFIN R，et al.Realtime vehicle and pedestrian tracking for didi udacity self-driving car challenge[C]//2018 IEEE International Conference on Robotics and Automation（ICRA），2018：2064-2069.
[39] Ultralytics.YOLOv5[EB/OL].[2021-03-14].https：//github.com/ultralytics/yolov5.