Improved Road Object Detection Algorithm for YOLOv8n

doi:10.3778/j.issn.1002-8331.2403-0383

Abstract

Abstract: Addressing the challenges posed by varying object scales and complex background interference that result in low detection accuracy and high missed detection rates in road scenes, an enhanced road object detection algorithm is proposed based on YOLOv8n. Firstly, the diverse branch block (DBB) is introduced to construct the C2fDBB module, replacing the original C2f module, thereby enhancing the network capacity to extract multi-scale features. Secondly, building upon the path aggregation network (PANet), the asymptotic feature pyramid network (AFPN) concept is leveraged to propose the path aggregation progressive feature pyramid network (PA-AFPN) feature fusion method, enhancing the network ability to integrate multi-scale features effectively. Additionally, the SPPF (spatial pyramid pooling fast) with dual-branch structure incorporating triplet attention (SPPF2_TA) module is designed, which efficiently integrates multi-scale information through an average pooling branch and triplet attention (TA) mechanism, effectively reducing the impact of background interference on detection. Finally, MPDIoU is adopted as the new boundary regression loss function to replace the original loss function, expediting algorithm convergence and enhancing object localization precision. Experimental results on the public road benchmark datasets BDD100K and SODA10M demonstrate that the improved algorithm achieves an increase of 5.7?percentage points and 7.3?percentage points in mAP@0.5 compared to baseline algorithms, with a reduction in computational load by 0.6 GFLOPs. Compared to other mainstream object detection methods, the proposed algorithm shows notable advantages in terms of FLOPs, FPS, and mAP@0.5, making it more suitable for object detection tasks in road scenes.

Key words: YOLOv8, structural reparameterization, asymptotic feature pyramid network (AFPN), road object, attention mechanism

摘要： 针对道路场景中目标尺度多变、复杂背景干扰导致检测精度低、漏检率高的问题，提出一种改进YOLOv8n的道路目标检测算法。引入多样化分支块（diverse branch block，DBB）构建C2fDBB模块，替代原算法中的C2f模块，增强网络多尺度特征提取能力。在路径聚合网络（path aggregation network，PANet）的基础上结合渐进特征金字塔网络（asymptotic feature pyramid network，AFPN）思想，提出PA-AFPN（path aggregation progressive feature pyramid network）特征融合方式，提升网络对多尺度特征的融合能力。设计SPPF2_TA（SPPF with dual-branch structure incorporating triplet attention）模块，通过在SPPF（spatial pyramid pooling fast）中引入平均池化分支和三重注意力机制（triplet attention，TA），有效整合多尺度信息，降低背景干扰对检测的影响。采用MPDIoU作为新边界回归损失函数，替代原损失函数，加速算法收敛，提高目标定位精度。在公开道路目标数据集BDD100K和SODA10M上的实验结果显示，改进方法的mAP@0.5相较于基线算法分别提升了5.7个百分点和7.3个百分点，计算量降低了0.6 GFLOPs。与其他主流目标检测方法相比，改进方法在计算量、FPS和mAP@0.5等方面均展现出显著优势，更加契合道路场景下的目标检测任务需求。

关键词: YOLOv8, 结构重参数化, 渐进特征金字塔网络（AFPN）, 道路目标, 注意力机制

GAO Deyong, CHEN Taida, MIAO Lan. Improved Road Object Detection Algorithm for YOLOv8n[J]. Computer Engineering and Applications, 2024, 60(16): 186-197.

高德勇, 陈泰达, 缪兰. 改进YOLOv8n的道路目标检测算法[J]. 计算机工程与应用, 2024, 60(16): 186-197.

References

[1] VIOLA P, JONES M J. Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57(2): 137-154.
[2] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[3] OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971-987.
[4] AKATA Z, PERRONNIN F, HARCHAOUI Z, et al. Good practice in large-scale learning for image classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 507-520.
[5] AGRAWAL P, GIRSHICK R, MALIK J. Analyzing the performance of multilayer neural networks for object recognition[C]//Proceedings of the 13th European Conference on Computer Vision, 2014: 329-344.
[6] HU M, WU Y, YANG Y, et al. DAGL-Faster: domain adaptive faster RCNN for vehicle object detection in rainy and foggy weather conditions[J]. Displays, 2023, 79: 102484.
[7] CHEN H, GUO X, et al. Multi-scale feature fusion pedestrian detection algorithm based on transformer[C]//Proceedings of the 2023 4th International Conference on Computer Vision, Image and Deep Learning, Zhuhai, 2023: 536-540.
[8] 杨祥, 王华彬, 董明刚. 改进YOLOv5的交通标志检测算法[J]. 计算机工程与应用, 2023, 59(13): 194-204.
YANG X, WANG H B, DONG M G. Improved YOLOv5’s traffic sign detection algorithm[J]. Computer Engineering and Applications, 2023, 59(13): 194-204.
[9] 杜娟, 崔少华, 晋美娟, 等. 改进YOLOv7的复杂道路场景目标检测算法[J]. 计算机工程与应用, 2024, 60(1): 96-103.
DU J, CUI S H, JIN M J, et al. Improved complex road scene object detection algorithm of YOLOv7[J]. Computer Engineering and Applications, 2024, 60(1): 96-103.
[10] 张利丰, 田莹. 改进YOLOv8的多尺度轻量型车辆目标检测算法[J]. 计算机工程与应用, 2024, 60(3): 129-137.
ZHANG L F, TIAN Y. Improved YOLOv8 multi-scale and lightweight vehicle object detection algorithm[J]. Computer Engineering and Applications, 2024, 60(3): 129-137.
[11] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[12] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 6517-6525.
[13] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[14] BOCHKOVSKIV A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv: 2004.10934, 2020.
[15] LI C Y, LI L L, JIANG H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[16] WANG C Y, BOCHKOVSKIV A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 2023: 7464-7475.
[17] DING X H, ZHANG X Y, HAN J, et al. Diverse branch block: building a convolution as an inception-like unit[C]//Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 10881-10890.
[18] MA S L, XU Y. MPDIoU: a loss for efficient and accurate bounding box regression[J]. arXiv:2307.07662, 2023.
[19] ZHENG Z H, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, 2020: 12993-13000.
[20] YU F, CHEN H F, WANG X, et al. BDD100K: a diverse driving dataset for heterogeneous multitask learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020: 2633-2642.
[21] HAN J, LIANG X, XU H, et al. SODA10M: a large-scale 2D self/semi-supervised object detection dataset for autonomous driving[J]. arXiv:2106.11118, 2021.
[22] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 2818-2826.
[23] YANG G Y, LEI J, ZHU Z K, et al. AFPN: asymptotic feature pyramid network for object detection[C]//Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics, Honolulu, 2023: 2184-2189.
[24] MISRA D, NALAMADA T, ARASANIPALAI A U, et al. Rotate to attend: convolutional triplet attention module[C]//Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, 2021: 3138-3147.