计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (16): 196-204.DOI: 10.3778/j.issn.1002-8331.2204-0225

• 图形图像处理 • 上一篇    下一篇

基于改进ResNet-CrowdDet的密集行人检测算法

韩文静,何宁,刘圣杰,于海港   

  1. 1.北京联合大学 智慧城市学院,北京 100101
    2.北京联合大学 北京市信息服务工程重点实验室,北京 100101
  • 出版日期:2023-08-15 发布日期:2023-08-15

Dense Pedestrian Detection Algorithm Based on Improved ResNet-CrowdDet

HAN Wenjing, HE Ning, LIU Shengjie, YU Haigang   

  1. 1.College of Smart City, Beijing Union University, Beijing 100101, China
    2.Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
  • Online:2023-08-15 Published:2023-08-15

摘要: 行人检测在自动驾驶、客流量统计、智能监控等很多领域被应用。这些场景中行人大多是密集的,存在多尺度、多姿态和遮挡等问题,使得目前的密集行人检测算法存在检测精度较低、漏检率较高等问题。基于ResNet-50-FPN的CrowdDet算法可以解决密集遮挡问题,在CrowdHuman数据集上得到了很好的结果。以此为基线检测器,提出了改进算法。该算法包含两个模块,即BoINet(bottleneck involution network)的骨干网络和DHCDet(double-head CrowdDet)的稀疏检测头部。与只使用了具有局域性和学习到静态参数的卷积的基线ResNet不同,BoINet将能够远距离交互的Involution动态卷积纳入到提取特征的任务中,增强行人特征的表达能力;DHCDet使用了Double-Head结构改进CrowdDet算法,并将Double-Head中的自注意力机制NL(non-local)替换为SNL(spectral non-local),进一步提升检测器的分类与回归的性能。该改进方法在CrowdHuman数据集上AP为91.15%,MR-2为39.74%,同时JI为83.60%,取得了比基线检测器更好的检测精度和更低的漏检率。

关键词: 密集行人检测, 增强特征表达, BoINet, 提升分类回归性能, DHCDet

Abstract: Pedestrian detection is used in many fields such as autonomous driving, passenger flow statistics, and intelligent monitoring. Most of the pedestrians in these scenes are dense, and there are problems such as multi-scale, multi-pose and occlusion, which make the current dense pedestrian detection algorithms have problems such as low detection accuracy and high missed detection rate. Recently, the CrowdDet algorithm based on ResNet-50-FPN has been introduced to solve the dense occlusion problem and obtained good results on the CrowdHuman dataset. In this paper, an improved algorithm is proposed based on this baseline detector. The algorithm consists of two modules, namely the backbone network of BoINet(bottleneck involution network) and the sparse detection head of DHCDet(double-head CrowdDet). Unlike the baseline of ResNet, which only uses convolutions with locality and  learns static parameters, BoINet incorporates Involution that can interact with each other from a long distance to extract features and learn dynamic parameters, which enhances the expression ability of pedestrian features. DHCDet uses the double-head structure to improve the CrowdDet algorithm, and replaces the self-attention mechanism non-local(NL) in double-head with SNL(spectral non-local) to further improve the classification and regression performance of the detector. The improved method in this paper achieves 91.15% AP, 39.74% MR-2, and 83.60% JI on the CrowdHuman dataset, the higher detection accuracy and lower missed detection rate compared with the baseline detector.

Key words: dense pedestrian detection, enhanced feature representation, BoINet, improved classification and regression performance, DHCDet