面向拥挤行人检测的改进DETR算法

doi:10.3778/j.issn.1002-8331.2212-0250

摘要/Abstract

摘要： 拥挤行人检测是行人检测领域的研究热点。针对拥挤行人检测场景中被遮挡目标及小目标行人易产生漏检的问题，提出一种改进DETR目标检测算法。针对拥挤行人场景中遮挡目标特征缺失的问题，采用注意力模型DETR作为基准模型，使模型可以在缺失部分特征的前提下完成目标检测；针对DETR模型对小目标行人检测效果差的问题，引入可变形注意力编码器，使模型可以有效利用含有大量小目标信息的多尺度特征图提升对小目标行人的检测精度；针对ResNet-50骨干网络对重要特征提取及提纯效率较低的问题，采用融合了通道空间注意力模块的改进EfficientNet骨干网络作为特征提取网络，提升模型对重要特征的提取能力以及提纯效率；针对采用注意力检测模块的模型训练效率较低的问题，训练时将Smooth-L1与GIOU结合作为损失函数，使模型可以进一步收敛至更高精度。在Wider-Person拥挤行人检测数据集上的实验结果表明，所提算法领先YOLO-x算法0.039的AP50精度，领先YOLO-v5算法0.015的AP50精度。该算法可以较好地运用于拥挤行人检测任务。

关键词: 机器视觉, 拥挤行人检测, 注意力机制, DETR算法

Abstract: Crowded pedestrian detection is a hot research topic in the field of pedestrian detection. An improved DETR target detection algorithm is proposed for the crowded pedestrian detection scenario where the occluded targets and small target pedestrians are prone to miss detection. For the problem of missing features of obscured targets in crowded pedestrian scenes, the attention model DETR is used as a benchmark model so that the model can complete target detection with some features missing. To address the problem of poor detection of small-target pedestrians by the DETR model, a deformable attention encoder is introduced so that the model can effectively use the multi-scale feature map containing a large amount of small-target information to improve the detection accuracy of small-target pedestrians. To address the problem of low efficiency of ResNet-50 backbone network in extracting and purifying important features, an improved EfficientNet backbone network incorporating CBAM is used as the feature extraction network to improve the extraction capability and purification efficiency of the model for important features. To address the problem of low training efficiency of the model using the attention detection module, Smooth-L1 is combined with GIOU as the loss function during training, so that the model can be further converged to higher accuracy. Experimental results on the Wider-Person crowded pedestrian detection dataset show that the proposed algorithm leads the YOLO-x by 0.039 AP50 accuracy and the YOLO-v5 by 0.015 AP50 accuracy. The proposed algorithm can be better applied to crowded pedestrian detection tasks.

Key words: machine vision, crowded pedestrian detection, attentional mechanisms, DETR algorithm

樊嵘, 马小陆. 面向拥挤行人检测的改进DETR算法[J]. 计算机工程与应用, 2023, 59(19): 159-165.

Improved DETR for Crowded Pedestrian Detection. Improved DETR for Crowded Pedestrian Detection[J]. Computer Engineering and Applications, 2023, 59(19): 159-165.

参考文献

[1] XU M，BAI Y，QU S S，et al.Semantic part RCNN for real-world pedestrian detection[C]//Proceedings of the CVPR Workshops，2019.
[2] 邹梓吟，盖绍彦，达飞鹏，等.基于注意力机制的遮挡行人检测算法[J].光学学报，2021，41（15）：157-165.
ZOU Z Y，GE S Y，DA F P，et al.Occluded pedestrian detection algorithm based on attention mechanism[J].Acta Optica Sinica，2021，41（15）：157-165.
[3] 李翔，何淼，罗海波.一种面向遮挡行人检测的改进YOLOv3算法[J].光学学报，2022，42（14）：160-169.
LI X，HE M，LUO H B.Occluded pedestrian detection algorithm based on improved YOLOv3[J].Acta Optica Sinica，2022，42（14）：160-169.
[4] 孙旭旦，吴清，赵春艳，等.语义增强引导特征重建的遮挡行人检测[J].红外与激光工程，2022，51（9）：381-390.
SUN X D，WU Q，ZHAO C Y，et al.Semantic enhanced guide feature reconstruction for occluded pedestrian detection[J].Infrared and Laser Engineering，2022，51（9）：381-390.
[5] 谢斌红，袁帅，龚大立.基于RDB-YOLOv4的煤矿井下有遮挡行人检测[J].计算机工程与应用，2022，58（5）：200-207.
XIE B H，YUAN S，GONG D L.Detection of blocked pedestrians based on RDB-YOLOv4 in coal mine[J].Computer Engineering and Applications，2022，58（5）：200-207.
[6] HOU Y，ZHENG L，GOULD S.Multiview detection with feature perspective transformation[C]//Proceedings of the European Conference on Computer Vision.Cham：Springer，2020：1-18.
[7] 邵香迎，郭颖，王友伟.AF-RetinaNet：一种基于自适应融合与特征细化的微小行人检测算法[J/OL].控制与决策：1-8[2023-02-08].https：//doi.org/10.13195/j.kzyjc.2022.0933.
SHAO X Y，GUO Y，WANG Y W.AF-RetinaNet：a tiny person detection algorithm based on adaptive fusion and feature refinement[J/OL].Control and Decision：1-8[2023-02-08].https：//doi.org/10.13195/j.kzyjc.2022.0933.
[8] HONG M，LI S，YANG Y，et al.SSPNet：scale selection pyramid network for tiny person detection from UAV images[J].IEEE Geoscience and Remote Sensing Letters，2021，19：1-5.
[9] ZHAO Q，SHENG T，WANG Y，et al.M2det：a single-shot object detector based on multi-level feature pyramid network[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2019：9259-9266.
[10] KIM S W，KOOK H K，SUN J Y，et al.Parallel feature pyramid network for object detection[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：234-250.
[11] CARION N，MASSA F，SYNNAEVE G，et al.End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision.Cham：Springer，2020：213-229.
[12] TAN M，LE Q.Efficientnet：rethinking model scaling for convolutional neural networks[C]//Proceedings of the International Conference on Machine Learning，2019：6105-6114.
[13] WOO S，PARK J，LEE J Y，et al.CBAM：convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：3-19.
[14] ZHU X，SU W，LU L，et al.Deformable DETR：deformable transformers for end-to-end object detection[J].arXiv：2010.04159，2020.
[15] DOSOVITSKIY A，BEYER L，KOLESNIKOV A，et al.An image is worth 16x16 words：Transformers for image recognition at scale[J].arXiv：2010.11929，2020.
[16] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[17] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[18] ZOPH B，LE Q V.Neural architecture search with reinforcement learning[J].arXiv：1611.01578，2016.
[19] SANDLER M，HOWARD A，ZHU M，et al.Mobilenetv2：inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：4510-4520.
[20] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[21] 林森，刘美怡，陶志勇.采用注意力机制与改进YOLOv5的水下珍品检测[J].农业工程学报，2021，37（18）：307-314.
LIN S，LIU M Y，TAO Z Y.Detection of underwater treasures using attention mechanism and improved YOLOv5[J].Transactions of the Chinese Society of Agricultural Engineering，2021，37（18）：307-314.
[22] 张宸嘉，朱磊，俞璐.卷积神经网络中的注意力机制综述[J].计算机工程与应用，2021，57（20）：64-72.
ZHANG C J，ZHU L，YU L.Review of attention mechanism in convolutional neural networks[J].Computer Engineering and Applications，2021，57（20）：64-72.
[23] WANG Q，WU B，ZHU P，et al.Supplementary material for ECA-Net：efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：13-19.
[24] LI X，WANG W，HU X，et al.Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：510-519.
[25] ZHANG H，WU C，ZHANG Z，et al.ResNeSt：split-attention networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2022：2736-2746.
[26] FU J，LIU J，TIAN H，et al.Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：3146-3154.
[27] XIN Y，WANG G，MAO M，et al.PAFNet：an efficient anchor-free object detector guidance[J].arXiv：2104. 13534，2021.
[28] TAN M，LE Q.Efficientnetv2：smaller models and faster training[C]//Proceedings of the International Conference on Machine Learning，2021：10096-10106.
[29] REZATOFIGHI H，TSOI N，GWAK J Y，et al.Generalized intersection over union：a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：658-666.
[30] 张乃雪，钟羽中，赵涛，等.基于Smooth-DETR的产品表面小尺寸缺陷检测算法[J].计算机应用研究，2022，39（8）：2520-2525.
ZHANG N X，ZHONG Y Z，ZHAO T，et al.Detection method for small-size surface defects based on Smooth-DETR[J].Application Research of Computers，2022，39（8）：2520-2525.
[31] ZHANG S，XIE Y，WAN J，et al.Widerperson：a diverse dataset for dense pedestrian detection in the wild[J].IEEE Transactions on Multimedia，2019，22（2）：380-393.
[32] CHEN K，WANG J，PANG J，et al.MMDetection：open mmlab detection toolbox and benchmark[J].arXiv：1906.
07155，2019.
[33] LIN T Y，MAIRE M，BELONGIE S，et al.Microsoft coco：common objects in context[C]//Proceedings of the European Conference on Computer Vision.Cham：Springer，2014：740-755.
[34] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision，2016：21-37.
[35] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.
[36] FASTER R.Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems，2015：2969239-2969250.
[37] REDMON J，FARHADI A.YOLOv3：an incremental improvement[J].arXiv：1804.02767，2018.
[38] GE Z，LIU S，WANG F，et al.YOLOx：exceeding YOLO series in 2021[J].arXiv：2107.08430，2021.