计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (19): 159-165.DOI: 10.3778/j.issn.1002-8331.2212-0250

• 图形图像处理 • 上一篇    下一篇

面向拥挤行人检测的改进DETR算法

樊嵘,马小陆   

  1. 安徽工业大学 电气与信息工程学院,安徽 马鞍山 243002
  • 出版日期:2023-10-01 发布日期:2023-10-01

Improved DETR for Crowded Pedestrian Detection

Improved DETR for Crowded Pedestrian Detection   

  1. School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, Anhui 243002, China
  • Online:2023-10-01 Published:2023-10-01

摘要: 拥挤行人检测是行人检测领域的研究热点。针对拥挤行人检测场景中被遮挡目标及小目标行人易产生漏检的问题,提出一种改进DETR目标检测算法。针对拥挤行人场景中遮挡目标特征缺失的问题,采用注意力模型DETR作为基准模型,使模型可以在缺失部分特征的前提下完成目标检测;针对DETR模型对小目标行人检测效果差的问题,引入可变形注意力编码器,使模型可以有效利用含有大量小目标信息的多尺度特征图提升对小目标行人的检测精度;针对ResNet-50骨干网络对重要特征提取及提纯效率较低的问题,采用融合了通道空间注意力模块的改进EfficientNet骨干网络作为特征提取网络,提升模型对重要特征的提取能力以及提纯效率;针对采用注意力检测模块的模型训练效率较低的问题,训练时将Smooth-L1与GIOU结合作为损失函数,使模型可以进一步收敛至更高精度。在Wider-Person拥挤行人检测数据集上的实验结果表明,所提算法领先YOLO-x算法0.039的AP50精度,领先YOLO-v5算法0.015的AP50精度。该算法可以较好地运用于拥挤行人检测任务。

关键词: 机器视觉, 拥挤行人检测, 注意力机制, DETR算法

Abstract: Crowded pedestrian detection is a hot research topic in the field of pedestrian detection. An improved DETR target detection algorithm is proposed for the crowded pedestrian detection scenario where the occluded targets and small target pedestrians are prone to miss detection. For the problem of missing features of obscured targets in crowded pedestrian scenes, the attention model DETR is used as a benchmark model so that the model can complete target detection with some features missing. To address the problem of poor detection of small-target pedestrians by the DETR model, a deformable attention encoder is introduced so that the model can effectively use the multi-scale feature map containing a large amount of small-target information to improve the detection accuracy of small-target pedestrians. To address the problem of low efficiency of ResNet-50 backbone network in extracting and purifying important features, an improved EfficientNet backbone network incorporating CBAM is used as the feature extraction network to improve the extraction capability and purification efficiency of the model for important features. To address the problem of low training efficiency of the model using the attention detection module, Smooth-L1 is combined with GIOU as the loss function during training, so that the model can be further converged to higher accuracy. Experimental results on the Wider-Person crowded pedestrian detection dataset show that the proposed algorithm leads the YOLO-x by 0.039 AP50 accuracy and the YOLO-v5 by 0.015 AP50 accuracy. The proposed algorithm can be better applied to crowded pedestrian detection tasks.

Key words: machine vision, crowded pedestrian detection, attentional mechanisms, DETR algorithm