计算机工程与应用 ›› 2026, Vol. 62 ›› Issue (1): 1-19.DOI: 10.3778/j.issn.1002-8331.2504-0330

• 热点与综述 • 上一篇    下一篇

基于深度学习的单目视觉目标检测综述

刘桂超,王怀光+,任国全,吴定海   

  1. 陆军工程大学 石家庄校区,石家庄 050051
  • 收稿日期:2025-04-23 修回日期:2025-06-09 在线发布日期:2026-01-01 出版日期:2025-12-31

Review of Monocular Vision Object Detection Based on Deep Learning

LIU Guichao, WANG Huaiguang+, REN Guoquan, WU Dinghai   

  1. Shijiazhuang Campus of Army Engineering University, Shijiazhuang 050051, China
  • Received:2025-04-23 Revised:2025-06-09 Online:2026-01-01 Published:2025-12-31

摘要: 单目视觉目标检测凭借其低硬件成本与高实时性的显著优势,已逐渐成为自动驾驶、智能监控等领域的核心技术,发挥着不可或缺的作用。然而,几何歧义性、遮挡鲁棒性及小目标检测精度等问题仍是当前研究的瓶颈。主要从算法层面出发,从算法演进、性能评估与轻量化设计三个维度系统性地量化分析单目视觉目标检测技术的进展:将单阶段检测算法解构为经典卷积架构与Transformer架构进行剖析,总结其结构创新与性能瓶颈,揭示精度-速度-复杂度的权衡规律;从网络设计-算法优化-模型压缩三个层面探讨轻量化技术与目标检测算法的融合策略,并整合目标检测用于训练和评估的三种主要官方数据集中的多维度评价指标,搭建基于MS-COCO-2017数据集的跨模型对比框架,对不同架构的单阶段检测器进行横向性能对比;展望多模态融合、轻量化改进等前沿方向,旨在为单目视觉目标检测算法的工程化应用与理论突破提供系统性参考。

关键词: 单阶段目标检测, 深度学习, 轻量化模型, 经典卷积架构, Transformer架构

Abstract: Monocular visual object detection, characterized by its low hardware cost and high real-time performance, has gradually become a core technology in fields such as autonomous driving and intelligent surveillance, playing an indispensable role. However, issues such as geometric ambiguity, occlusion robustness, and small object detection accuracy remain significant bottlenecks in current research. This paper primarily focuses on algorithm-level advancements, systematically quantifying the progress in monocular visual object detection technology from three perspectives: algorithm evolution, performance evaluation, and lightweight design. Firstly, single-stage detection algorithms are ?deconstructed into classic convolutional architectures and Transformer architectures, analyzing and summarizing their structural innovations and performance bottlenecks, revealing the trade-off patterns between accuracy, speed, and complexity. Secondly, from the three levels of network design, algorithm optimization, and model compression, this paper explores the integration strategies of lightweight technologies with object detection algorithms. It also consolidates multi-dimensional evaluation metrics from the three main official datasets used for training and evaluation in object detection, building a cross-model comparison framework based on the MS-COCO-2017 dataset to conduct a horizontal performance comparison of different single-stage detectors. Finally, the paper looks ahead to cutting-edge directions such as multimodal fusion and lightweight improvements, aiming to provide systematic references for the engineering application and theoretical breakthroughs of monocular visual object detection algorithms.

Key words: one-stage object detection, deep learning, lightweight model, classic convolution architecture, Transformer architecture