Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (13): 124-137.DOI: 10.3778/j.issn.1002-8331.2411-0390

• Special Issue on Object Detection • Previous Articles     Next Articles

YOLO-CDC:Improved YOLOv8 Vehicle Object Detection Algorithm

ZHANG Haochen, ZHANG Zhulin, SHI Ruiyan, WANG Wenhan, LEI Zhennuo   

  1. School of Automotive Engineering, Shandong Jiaotong University, Jinan 250357, China
  • Online:2025-07-01 Published:2025-06-30

YOLO-CDC:优化改进YOLOv8的车辆目标检测算法

张浩晨,张竹林,史瑞岩,王文翰,雷镇诺   

  1. 山东交通学院 汽车工程学院,济南 250357

Abstract: Aiming at the low detection and positioning accuracy of small targets and occluded targets in the traffic scene, and the detection effect is subject to contrast changes and image noise, this paper proposes a vehicle target detection model YOLO-CDC based on YOLOv8. Firstly, this paper replaces the C2f module combining the global feature extraction capability of Transformer structure with C2Former module, and designs a gated repetitive multi-layer perception (GRMLP) unit to enhance Transformer’s branch nonlinearity expression capability and realize high-quality global information aggregation. Secondly, a multi-scale feature fusion branch is designed for feature integration: combining deformable convolution and SPDconv module to extract P2 layer resolution information and capture finer edge features, and the CSP_FRM (CSP feature reconstruct module) module is designed to balance the computational burden of large kernel convolution and frequency domain features. Finally, a multi-scale feature enhancement module is designed: using large kernel convolution and strip convolution to expand receptive fields and supplement context information, and using non-step convolution to supplement the fine-grained details of the target. Combined multi-frequency single-channel attention mechanism and channel attention, it designs an coupling spatial and frequency-domain attention to suppress image noise and background interference. The experimental results show that the average accuracy (mAP50-95) on the UA-DETRA dataset and SODA10M dataset is increased by 3.6 and 4.1 percentage points respectively, the detection speeds is 340.1 FPS and 341.4 FPS, respectively, demonstrating the higher detection and positioning accuracy and generalization of the improved algorithm.

Key words: traffic scene, YOLOv8, C2Former, multi-frequency single-channel attention, frequency-domain information

摘要: 针对交通场景小目标、遮挡目标检测和定位精度低,受对比度变化及图像噪声影响,提出一种基于YOLOv8的车辆目标检测模型(YOLO-CDC)。提出了结合Transformer结构全局特征提取能力的模块C2Former代替C2f模块,设计了重感知的门控线性单元GRMLP(gated repetitive multi-layer perception)优化Transformer分支非线性表达能力,实现高质量全局信息聚合;设计了多尺度特征融合分支整合特征,结合可变形卷积和SPDconv模块提取P2层高分辨率信息,捕获更精细的边缘特征;设计了CSP_FRM(CSP feature reconstruct module)模块平衡引入大核卷积及频域特征的计算负担;构建了一种特征增强模块,采用大核卷积与条状卷积扩大感受野补充上下文信息,结合非跨步卷积补充局部细粒度信息;采用了一种多频单通道注意力形式与通道特征结合以及设计了一种耦合空间和频域信息的注意力机制抑制图像噪声和背景干扰。实验结果表明,在UA-DETRA数据集和SODA10M数据集上平均精度(mAP50-95)分别提升了3.6、4.1个百分点,检测速度分别为340.1、341.4 FPS,具有更高检测和定位精度及泛化性。

关键词: 交通场景, YOLOv8, C2Former, 多频单通道注意力, 频域信息