计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (17): 304-316.DOI: 10.3778/j.issn.1002-8331.2405-0415

• 图形图像处理 • 上一篇    下一篇

针对密集行人检测任务中多尺度目标的检测算法

徐振峰,许云峰,于子洲,梅卫,张妍   

  1. 1.河北科技大学 信息科学与工程学院,石家庄 050018 
    2.中国人民解放军陆军工程大学 石家庄校区,石家庄 050003
  • 出版日期:2025-09-01 发布日期:2025-09-01

Multi-Scale Target Detection Algorithm for Dense Pedestrian Detection Task

XU Zhenfeng, XU Yunfeng, YU Zizhou, MEI Wei, ZHANG Yan   

  1. 1.School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang 050018, China
    2.Shijiazhuang Campus of Army Engineering University, Shijiazhuang 050003, China
  • Online:2025-09-01 Published:2025-09-01

摘要: 在密集行人检测任务中目标的检测精度低,漏检和误检等一直是充满挑战的问题,导致此问题的原因是大多数的场景中存在大量多尺度的目标,多尺度的目标使得算法面临着尺度变化,从而使得算法的精度不高。针对此问题,提出了一种基于改进YOLOv5s的多尺度行人检测网络(MPDNet)。网络改进包括三个方面:对于主干网络,在C3模块中添加了空间位置注意力模块,并引入改进的ViTv3Block模块,可以有效强化特征信息的提取;特征融合部分,在渐近特征金字塔网络(AFPN)的基础上进行了改进,改进后的AFPN可以在更少参数量和计算量的情况下进行跨层特征融合;在特征融合网络末端添加了空间加强多尺度注意力模块(SEMA),增强模型对目标的定位能力。通过分析实验结果,MPDNet在WiderPerson和CrowdHuman两个密集行人检测数据集上相较于YOLOv5s,AP50分别提升了4.2和3.2个百分点,AP50:95分别提升了5.0和3.9个百分点。MPDNet能够很好地完成复杂场景中密集行人检测任务。

关键词: YOLOv5s, 密集行人检测, 渐进多尺度特征融合, 目标检测, 注意力机制

Abstract: In the dense pedestrian detection task, the detection accuracy of the target is low, and missed detections and false detections always pose constant challenges. The reason for this problem is that there are a large number of multi-scale targets in most scenes. The multi-scale targets make the algorithm face scale changes, which makes the accuracy of the algorithm not high. To solve this problem, a multi-scale pedestrian detection network (MPDNet) based on improved YOLOv5s is proposed. The network improvement includes three aspects: firstly, for the backbone network, the spatial position attention module is added to the C3 module. Secondly, the improved ViTv3Block module is introduced, which can effectively enhance the extraction of feature information. The feature fusion part is improved on the basis of the asymptotic feature pyramid network (AFPN). The improved AFPN can perform cross layer feature fusion with less parameters and computation. Finally, a spatial enhanced multi-scale attention module (SEMA) is added at the end of the feature fusion network to enhance the model’s ability to locate targets. By analyzing the experimental results, the paper shows that MPDNet has increased AP50 by 4.2 and 3.2 percentage points, and AP50:95 by 5.0 and 3.9 percentage points, respectively, compared with YOLOv5s on the WiderPerson and CrowdHuman dense pedestrian detection data sets. MPDNet can well complete the task of dense pedestrian detection in complex scenes.

Key words: YOLOv5s, dense pedestrian detection, progressive multi-scale feature fusion, target detection, attention mechanism