计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (14): 148-162.DOI: 10.3778/j.issn.1002-8331.2412-0023

• 目标检测专题 • 上一篇    下一篇

DPRT-YOLO:智能网联汽车复杂驾驶环境实时目标检测器

董一兵,曾辉,李建科,侯少杰,石磊   

  1. 1.河北经贸大学 管理科学与信息工程学院,石家庄 050061 
    2.河北省产业转型升级服务中心 高技术发展部,石家庄 050000
  • 出版日期:2025-07-15 发布日期:2025-07-15

DPRT-YOLO: Real-Time Object Detector for Intelligent and Connected Vehicles in Complex Driving Environments

DONG Yibing, ZENG Hui, LI Jianke, HOU Shaojie, SHI Lei   

  1. 1.School of Management Science and Information Engineering, Hebei University of Economics and Business, Shijiazhuang 050061, China
    2.Department of High Technology Development, Hebei Center of Industrial Transformation and Upgrading Service, Shijiazhuang 050000, China
  • Online:2025-07-15 Published:2025-07-15

摘要: 目标检测是智能网联汽车视觉感知系统的一项基本任务,可为先进驾驶辅助系统提供基础数据和决策依据。然而,在低光照和恶劣天气等复杂环境中,车载目标检测算法面临小目标检测性能不佳、漏检率和误检率偏高的挑战。针对这一挑战,发展了一种面向智能网联汽车的实时目标检测器(DPRT-YOLO),通过对流行的YOLOv10模型进行改造,使其更加适用于复杂驾驶环境中的目标检测任务,并通过在NVIDIA边缘计算平台上开展消融和对比实验,验证了算法的有效性。设计了增强加权多分支特征融合网络(EWMFFN),引入浅层加权融合和多分支加权融合模块,消除特征融合过程中的层间干扰,设计星形拓扑特征交互结构,提升模型对小尺度目标的检测能力,同时保持了网络结构的轻量化设计。融合卷积门控线性单元(convolutional gated linear units,CGLU)与卷积加法自注意力(convolutional additive token mixer,CATM),通过局部-全局双通路机制建立小目标尺度信息的长期上下文关系并保持模型的轻量化。为了评估模型在真实算力场景中的检测性能,将其部署在NVIDIA Jetson Xavier Nx平台上,采用NVIDIA TensorRT FP16量化加速,在BDD100K和TT100K测试集上开展推理实验,并与基准模型进行对比,结果显示:(1)检测精度方面,与YOLOv10n和YOLO11n相比,改进模型的mAP@0.5指标分别提升了6.1和7.4个百分点,mAP@0.5:0.95指标分别提升了3.6和4.2个百分点,同时,参数量分别降低了26.1%和34.9%。(2)检测速度方面,改进模型Small和Nano两种版本的推理速度分别达到了29?FPS和35?FPS。实验结果表明:与参考模型相比,改进算法在复杂驾驶环境中的表现更加优异,在检测精度与检测速度之间达到了更好的平衡,适于部署在智能网联汽车的环境感知系统中。

关键词: 实时目标检测, 复杂驾驶环境, DPRT-YOLO, 多尺度特征融合, Transformer

Abstract: Object detection is a fundamental task of the visual perception system in intelligent and connected vehicles (ICV), providing essential data and decision-making support for the advanced driver assistance systems (ADAS). However, in complex scenes with low illumination or severe weather, the existing vehicle-mounted object detectors fail to deliver satisfactory performance in detecting small-scale objects including the traffic signs, traffic signals and distant vehicles, leading to high false negative and false positive rates. To address this challenge, a real-time object detector (DPRT-YOLO) is developed for ICV. The proposed detector is an enhancement of the YOLOv10, making it more suitable for object detection tasks in complex driving environments. The effectiveness is verified through ablation and comparative experiments conducted on the NVIDIA embedded platform. Initially, the enhanced weighted multi-branch feature fusion network (EWMFFN) is designed to eliminate interference during feature fusion between layers, improving the ability to detect small-scale objects while maintaining a lightweight network structure. Then, the convolutional gated linear units (CGLU) and convolutional additive token mixer (CATM) are applied to construct the feature extraction module (ECAS-ViT), aiming to build long-range contextual relationships for small-scale target information while preserving lightweight characteristics. To evaluate the model’s performance in real-world computational scenarios, it is deployed on the NVIDIA Jetson Xavier NX embedded platform, with NVIDIA TensorRT FP16 quantization acceleration. Comparative experiments are conducted on the BDD100K and TT100K dataset and the results show that: (1) In terms of detection accuracy, compared with YOLOv10n and YOLO11n, the mAP@0.5 of the proposed model improves by 6.1 and 7.4 percentage points, respectively, and the mAP@0.5:0.95 improves by 3.6 and 4.2 percentage points, while the parameter count decreases by 26.1% and 34.9%, respectively. (2) In terms of inference speed, the Small and Nano versions of the proposed model achieve inference speeds of 29 FPS and 35 FPS, respectively. The experimental results demonstrate that the proposed detector outperforms reference models in complex driving environments, achieving a better balance between detection accuracy and speed, making it suitable for deployment in the environmental perception system of ICV.

Key words: real-time object detection, complex driving environments, DPRT-YOLO, multi-scale feature fusion, Transformer