计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (23): 161-172.DOI: 10.3778/j.issn.1002-8331.2409-0077

• 模式识别与人工智能 • 上一篇    下一篇

基于细分多尺度和并行注意力的密集人群检测算法

张欣,亢世宁,杨寓淇,王珺,马致远   

  1. 1.河北大学 电子信息工程学院,河北 保定 071000
    2.河北大学 河北省机器视觉技术创新中心,河北 保定 071000
  • 出版日期:2025-12-01 发布日期:2025-12-01

Refined Multi-Scale Feature and Parallel Attention Based Crowd Detection

ZHANG Xin, KANG Shining, YANG Yuqi, WANG Jun, MA Zhiyuan   

  1. 1.College of Electronic and Information Engineering, Hebei University, Baoding, Hebei 071000, China
    2.Machine Vision Technology Innovation Center of Hebei Province, Hebei University, Baoding, Hebei 071000, China
  • Online:2025-12-01 Published:2025-12-01

摘要: 人群检测在自动驾驶、交通管理和智能安防等领域有着广泛的应用。其具有检测人群密度大、行人遮挡多、尺度变化大和人群分布不规则的特点,是计算机视觉中具有挑战性的问题之一。为了进一步挖掘密集场景下人群丰富的多尺度信息,以及应对人群分布和形状不规则的挑战,在Sparse R-CNN的基础上提出了一种基于细分多尺度和并行注意力的人群检测算法,命名为RMF R-CNN(refined multiscale feature R-CNN),其通过并行多个不同尺度的膨胀卷积构建感受野融合模块以提取细化的尺度信息。基于膨胀卷积注意力和可变形卷积注意力构建并行注意力模块,以从不同的尺度感知人群的分布与形状信息。为了缓解因数据误标注和行人尺度所导致的损失敏感,在原有损失函数的基础上加入了动态损失权重,使损失因行人尺度和预测准度而动态变化,提升模型的泛化能力。实验结果表明,所提算法在CrowdHuman、CityPersons等数据集中的AP为91.1%,MR?2为44.5%,Recall为96.7%。该算法能够在一定程度上提升密集场景中人群检测的性能。

关键词: 人群检测, 细分多尺度, 注意力机制, Sparse R-CNN, 动态损失权重

Abstract: Crowd detection has wide applications in fields such as autonomous driving, traffic management, and intelligent security. It is characterized by high crowd density, significant pedestrian occlusion, large scale variation, and irregular crowd distribution, which makes it one of the challenging problems in computer vision. To further explore the rich multi-scale information in dense scenes and address the challenges of irregular crowd distribution and shapes, a crowd detection algorithm based on refined multi-scale and parallel attention mechanisms is proposed in this paper, named as RMF R-CNN(refined multiscale feature R-CNN), building upon Sparse R-CNN. Firstly, a receptive field fusion module is proposed using parallel dilated convolutions of different scales to extract refined scale information. Then, a parallel attention module is constructed based on dilated convolution attention and deformable convolution attention to perceive crowd distribution and shape information from different scales. Finally, to mitigate loss sensitivity caused by data mislabeling and pedestrian scale, a dynamic loss weight is added to the original loss function, allowing the loss to dynamically change according to pedestrian scale and prediction accuracy, and enhancing the method’s generalization ability. Experimental results show that the proposed algorithm achieves an AP of 91.1%, an MR?2 of 44.5% and a Recall of 96.7% on datasets such as CrowdHuman and CityPersons. It also shows that the proposed algorithm can improve the performance of crowd detection in dense scenes.

Key words: crowd detection, refined multi-scale feature, attention mechanism, Sparse R-CNN, dynamic loss