基于改进ResNet-CrowdDet的密集行人检测算法

doi:10.3778/j.issn.1002-8331.2204-0225

摘要/Abstract

摘要： 行人检测在自动驾驶、客流量统计、智能监控等很多领域被应用。这些场景中行人大多是密集的，存在多尺度、多姿态和遮挡等问题，使得目前的密集行人检测算法存在检测精度较低、漏检率较高等问题。基于ResNet-50-FPN的CrowdDet算法可以解决密集遮挡问题，在CrowdHuman数据集上得到了很好的结果。以此为基线检测器，提出了改进算法。该算法包含两个模块，即BoINet（bottleneck involution network）的骨干网络和DHCDet（double-head CrowdDet）的稀疏检测头部。与只使用了具有局域性和学习到静态参数的卷积的基线ResNet不同，BoINet将能够远距离交互的Involution动态卷积纳入到提取特征的任务中，增强行人特征的表达能力；DHCDet使用了Double-Head结构改进CrowdDet算法，并将Double-Head中的自注意力机制NL（non-local）替换为SNL（spectral non-local），进一步提升检测器的分类与回归的性能。该改进方法在CrowdHuman数据集上AP为91.15%，MR-2为39.74%，同时JI为83.60%，取得了比基线检测器更好的检测精度和更低的漏检率。

关键词: 密集行人检测, 增强特征表达, BoINet, 提升分类回归性能, DHCDet

Abstract: Pedestrian detection is used in many fields such as autonomous driving, passenger flow statistics, and intelligent monitoring. Most of the pedestrians in these scenes are dense, and there are problems such as multi-scale, multi-pose and occlusion, which make the current dense pedestrian detection algorithms have problems such as low detection accuracy and high missed detection rate. Recently, the CrowdDet algorithm based on ResNet-50-FPN has been introduced to solve the dense occlusion problem and obtained good results on the CrowdHuman dataset. In this paper, an improved algorithm is proposed based on this baseline detector. The algorithm consists of two modules, namely the backbone network of BoINet（bottleneck involution network） and the sparse detection head of DHCDet（double-head CrowdDet）. Unlike the baseline of ResNet, which only uses convolutions with locality and learns static parameters, BoINet incorporates Involution that can interact with each other from a long distance to extract features and learn dynamic parameters, which enhances the expression ability of pedestrian features. DHCDet uses the double-head structure to improve the CrowdDet algorithm, and replaces the self-attention mechanism non-local（NL） in double-head with SNL（spectral non-local） to further improve the classification and regression performance of the detector. The improved method in this paper achieves 91.15% AP, 39.74% MR-2, and 83.60% JI on the CrowdHuman dataset, the higher detection accuracy and lower missed detection rate compared with the baseline detector.

Key words: dense pedestrian detection, enhanced feature representation, BoINet, improved classification and regression performance, DHCDet

韩文静, 何宁, 刘圣杰, 于海港. 基于改进ResNet-CrowdDet的密集行人检测算法[J]. 计算机工程与应用, 2023, 59(16): 196-204.

HAN Wenjing, HE Ning, LIU Shengjie, YU Haigang. Dense Pedestrian Detection Algorithm Based on Improved ResNet-CrowdDet[J]. Computer Engineering and Applications, 2023, 59(16): 196-204.

参考文献

[1] GIZARSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition，2014：580-587.
[2] GIRSHICK R.Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision，2015：1440-1448.
[3] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems，2015：91-99.
[4] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：779-788.
[5] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：6517-6525.
[6] REDMON J，FARHADI A.YOLOv3：an incremental improvement[J].arXiv：1804.02767，2018.
[7] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision，2016：21-37.
[8] CHI C，ZHANG S，XING J，et al.PedHunter：occlusion robust pedestrian detector in crowded scenes[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence，2020：10639-10646.
[9] ZHOU C，YUAN J.Bi-box regression for pedestrian detection and occlusion estimation[C]//Proceedings of the 15th European Conference on Computer Vision，2018：135-151.
[10] LIN C Y，XIE H X，ZHENG H.PedJointNet：joint head-shoulder and full body deep network for pedestrian detection[J].IEEE Access，2019，7：47687-47697.
[11] LIU S，HUANG D，WANG Y.Adaptive NMS：refining pedestrian detection in a crowd[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：6459-6468.
[12] ZHOU P，ZHOU C，PENG P，et al.Noh-NMS：improving pedestrian detection by nearby objects hallucination[C]//Proceedings of the 28th ACM International Conference on Multimedia，2020：1967-1975.
[13] WANG X，XIAO T，JIANG Y，et al.Repulsion loss：detecting pedestrians in a crowd[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：7774-7783.
[14] CHU X，ZHENG A，ZHANG X，et al.Detection in crowded scenes：one proposal，multiple predictions[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：12214-12223.
[15] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，Las Vegas，2016：770-778.
[16] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[17] LI D，HU J，WANG C，et al.Involution：inverting the inherence of convolution for visual recognition[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：12321-12330.
[18] WU Y，CHEN Y，YUAN L，et al.Rethinking classification and localization for object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：10186-10195.
[19] WANG X，GIRSHICK R，GUPTA A，et al.Non-local neural networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：7794-7803.
[20] ZHU L，SHE Q，LI D，et al.Unifying nonlocal blocks for neural networks[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision，2021：12292-12301.
[21] SHAO S，ZHAO Z，LI B，et al.CrowdHuman：a benchmark for detecting human in a crowd[J].arXiv：1805.00123，2018.
[22] PAPAGEORGIOU C，POGGIO T.A trainable system for object detection[J].International Journal of Computer Vision，2000，38（1）：15-33.
[23] OJALA T，PIETIKAINEN M，MAENPAA T.Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2002，24（7）：971-987.
[24] LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision，2004，60（2）：91-110.
[25] DALAL N，TRIGGS B.Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition，2005：886-893.
[26] SUN K，XIAO B，LIU D，et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：5693-5703.
[27] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[28] WOO S，PARK J，LEE J Y，et al.CBAM：convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision，2018：3-19.
[29] XIE S，GIRSHICK R，DOLLáR P，et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：1492-1500.
[30] ZHOU J，JAMPANI V，PI Z，et al.Decoupled dynamic filter networks[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：6647-6656.
[31] DAI J，QI H，XIONG Y，et al.Deformable convolutional networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision，2017：764-773.
[32] LIU S，QI L，QIN H，et al.Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：8759-8768.
[33] TAN M X，PANG R M.EfficientDet：scalable and efficient object detection[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition，2020：10781-10790.
[34] HUANG S，LU Z，CHENG R，et al.FaPN：feature-aligned pyramid network for dense image prediction[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision，2021：864-873.
[35] HE K，GKIOXARI G，DOLLáR P，et al.Mask R-CNN[C]//Proceedings of the 2021 IEEE International Conference on Computer Vision，2017：2961-2969.
[36] CAI Z，VASCONCELOS N，VASCONCELOS N.Cascade R-CNN：delving into high quality object detection[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：6154-6162.
[37] JIANG B，LUO R，MAO J，et al.Acquisition of localization confidence for accurate object detection[C]//Proceedings of the 15th European Conference on Computer Vision，2018：784-799.
[38] ZHANG S，BENENSON R，SCHIELE B.CityPersons：a diverse dataset for pedestrian detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：3213-3221.
[39] GE Z，JIE Z，HUANG X，et al.PS-RCNN：detecting secondary human instances in a crowd via primary object suppression[C]//2020 IEEE International Conference on Multimedia and Expo，2020：1-6.
[40] RUKHOVICH D，SOFIIUK K，GALEEV D，et al.IterDet：iterative scheme for object detection in crowded environments[C]//Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition.Cham：Springer，2021：344-354.
[41] XU Z，LI B，YUAN Y，et al.Beta R-CNN：looking into pedestrian detection from another perspective[C]//Advances in Neural Information Processing Systems 33，2020：19953-19963.
[42] SHAO X，WANG Q，YANG W，et al.Multi-scale feature pyramid network：a heavily occluded pedestrian detection network based on ResNet[J].Sensors，2021，21（5）：1820.