面向户外导盲场景的道路目标检测算法

doi:10.3778/j.issn.1002-8331.2406-0387

摘要/Abstract

摘要： 针对户外导盲场景中道路目标检测存在的复杂背景干扰及关键语义信息需求，当前目标检测算法在道路目标检测中表现出较低的准确性以及容易出现漏检的问题，为此提出一种基于YOLOv8n的道路目标检测算法OD-YOLO。基于FasterNet和SPPF构建主干网络；使用FasterNet以增强特征提取能力，在SPPF模块中引入可分离大核注意力机制（large separable kernel attention，LSKA）以提高算法对道路目标整体的感知能力。提出一种新的C2f模块GAC2f，在减小模型计算量的同时提高其特征捕获能力，同时通过使用多样分支模块（diverse branch block，DBB）中结构重参数化思想优化GAC2f，在不损失模型性能的前提下，融合多种特征信息以显著提高模型精度，另一方面使用卷积门控线性单元（convolutional gated linear unit，Convolutional GLU）改进LarK中的大核卷积以优化GAC2f，使模型能够捕获更多上下文信息。提出一种轻量级非对称检测头PADH，在提高模型性能的同时减少参数量，并使用PIoUv2改进原有的损失函数，通过基于层自适应稀疏度的量级剪枝（layer-adaptive sparsity for the magnitude-based pruning，LAMP）操作进一步优化算法模型。实验结果表明，在公共人行道路目标数据集WOTR上，OD-YOLO与YOLOv8n相比，经过剪枝后模型参数量同为3×106，但mAP@0.5、mAP@0.5：0.95分别提升3.4和4.1个百分点，证明算法OD-YOLO在面向户外导盲场景的道路目标检测中可以达到预期的效果。

关键词: 户外导盲, 目标检测, 轻量化, 通道剪枝, YOLOv8n

Abstract: To address the challenges of complex background interference and the need for key semantic information in road object detection for outdoor blind navigation, as well as the low accuracy and frequent missed detections of current models, a road object detection algorithm named OD-YOLO is proposed, based on YOLOv8n. The backbone network utilizes FasterNet to enhance feature extraction. In the SPPF module, a large separable kernel attention (LSKA) mechanism is introduced to improve the perception of road object. The GAC2f module is designed to reduce computational load while enhancing feature capture capability. By optimizing GAC2f with structural reparameterization from the diverse branch block (DBB), multiple features are fused without sacrificing performance, significantly improving accuracy. Additionally, the LarK large kernel convolution, optimized with the convolutional gated linear unit (Convolutional GLU), captures more contextual information. The lightweight asymmetric detection head, PADH, enhances performance while reducing the number of parameters. The loss function is refined using PIoUv2, and further model optimization is achieved through layer-adaptive sparsity for magnitude-based pruning (LAMP). Experimental results on the WOTR public pedestrian road object dataset demonstrate that OD-YOLO, compared to YOLOv8n, reduces parameters to 3×106 and improves mAP@0.5 and mAP@0.5：0.95 by 3.4 and 4.1 percentage points, respectively. It proves that the algorithm OD-YOLO can achieve the expected effect in the road object detection of outdoor blind navigation scenarios.

Key words: outdoor blind navigation, object detection, lightweight, channel pruning, YOLOv8n

李明, 何志奇, 党青霞, 朱胜利. 面向户外导盲场景的道路目标检测算法[J]. 计算机工程与应用, 2025, 61(9): 242-254.

LI Ming, HE Zhiqi, DANG Qingxia, ZHU Shengli. Road Object Detection Algorithm for Outdoor Blind Navigation Scenariosc[J]. Computer Engineering and Applications, 2025, 61(9): 242-254.

参考文献

[1] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440- 1448.
[2] HE K, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[3] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[4] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision(ECCV 2016), Amsterdam, The Netherlands, October 11-14, 2016: 21-37.
[5] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 213-229.
[6] GE Z, LIU S, WANG F, et al. YOLOx: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[7] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[8] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[9] 刘源, 张荣芬, 刘宇红, 等. 基于CE-YOLOX的导盲系统障碍物检测方法[J]. 液晶与显示, 2023, 38(9):1281-1292.
LIU Y, ZHANG R F, LIU Y H, et al. Obstacle detection method for guide system based on CE-YOLOX[J]. Chinese Journal of Liquid Crystals and Displays, 2023, 38(9): 1281-1292.
[10] GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression[J]. arXiv:2205.12740, 2022.
[11] 田鹏, 毛力. 改进YOLOv8的道路交通标志目标检测算法[J]. 计算机工程与应用, 2024, 60(8): 202-212.
TIAN P, MAO L. Improved YOLOv8 object detection algorithm for traffic sign target[J]. Computer Engineering and Applications, 2024, 60(8): 202-212.
[12] 刘辉, 刘鑫满, 刘大东. 面向复杂道路目标检测的YOLOv5算法优化研究[J]. 计算机工程与应用, 2023, 59(18): 207-217.
LIU H, LIU X M, LIU D D. Research on optimization of YOLOv5 detection algorithm for object in complex road[J]. Computer Engineering and Applications, 2023, 59(18): 207-217.
[13] CHEN J, KAO S, HE H, et al. Run, don’t walk: chasing higher FLOPs for faster neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 12021-12031.
[14] LAU K W, PO L M, REHMAN Y A U. Large separable kernel attention: rethinking the large kernel attention design in CNN[J]. Expert Systems with Applications, 2024, 236: 121352.
[15] LI H, LI J, WEI H, et al. Slim-neck by GSConv: a better design paradigm of detector architectures for autonomous vehicles[J]. arXiv:2206.02424, 2022.
[16] YANG L, ZHANG R Y, LI L, et al. Simam: a simple, parameter-free attention module for convolutional neural networks[C]//Proceedings of the International Conference on Machine Learning, 2021: 11863-11874.
[17] DING X, ZHANG X, HAN J, et al. Diverse branch block: building a convolution as an inception-like unit[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10886-10895.
[18] SHI D. TransNeXt: robust foveal visual perception for vision Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 17773-17783.
[19] DING X, ZHANG Y, GE Y, et al. UniRepLKNet: a universal perception large-kernel ConvNet for audio video point cloud time-series and image recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 5513-5524.
[20] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 12993-13000.
[21] LIU C, WANG K, LI Q, et al. Powerful-IoU: more straightforward and faster bounding box regression loss with a nonmonotonic focusing mechanism[J]. Neural Networks, 2024, 170: 276-284.
[22] LEE J, PARK S, MO S, et al. Layer-adaptive sparsity for the magnitude-based pruning[J]. arXiv:2010.07611, 2020.
[23] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning, 2015: 448-456.
[24] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1251-1258.
[25] SHAZEER N. Glu variants improve transformer[J]. arXiv:2002.05202, 2020.
[26] ZHANG J, CHEN Z, YAN G, et al. Faster and lightweight: an improved YOLOv5 object detector for remote sensing images[J]. Remote Sensing, 2023, 15(20): 4974.
[27] XIA H, YAO C, TAN Y, et al. A dataset for the visually impaired walk on the road[J]. Displays, 2023, 79: 102486.
[28] TANG W, LIU D, ZHAO X, et al. A dataset for the recognition of obstacles on blind sidewalk[J]. Universal Access in the Information Society, 2023, 22(1): 69-82.
[29] ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection[J]. arXiv:2304.08069, 2023.
[30] LIU X, PENG H, ZHENG N, et al. EfficientViT: memory efficient vision transformer with cascaded group attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14420- 14430.
[31] MA X, DAI X, BAI Y, et al. Rewrite the stars[J]. arXiv:2403.19967, 2024.
[32] XIA Z, PAN X, SONG S, et al. Vision transformer with deformable attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 4794-4803.
[33] ZHAO Y, LV W, XU S, et al. Detrs beat YOLOs on real-time object detection[J]. arXiv:2304.08069, 2023.
[34] HUANG H, CHEN Z, ZOU Y, et al. Channel prior convolu-tional attention for medical image segmentation[J]. arXiv:2306.05196, 2023.
[35] HU M, FENG J, HUA J, et al. Online convolutional re-parameterization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 568-577.
[36] HUANG L, LI W, SHEN L, et al. YOLOCS: object detection based on dense channel compression for feature spatial solidification[J]. arXiv:2305.04170, 2023.
[37] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 658-666.
[38] SILIANG M, YONG X. MPDIoU: a loss for efficient and accurate bounding box regression[J]. arXiv:2307.07662, 2023.
[39] ZHANG H, ZHANG S. Shape-IoU: more accurate metric considering bounding box shape and scale[J]. arXiv:2312.17663, 2023.
[40] TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[J]. arXiv:2301.10051, 2023.
[41] FANG G, MA X, SONG M, et al. Depgraph: towards any structural pruning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 16091-16101.
[42] WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information[J]. arXiv:2402.13616, 2024.
[43] WANG A, CHEN H, LIU L, et al. YOLOv10: real-time end-to-end object detection[J]. arXiv:2405.14458, 2024.
[44] ZHANG J, LV Y, TAO J, et al. A robust real-time anchor-free traffic sign detector with one-level feature[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8(2): 1437-1451.