XSSD-P：Improved SSD Pedestrian Detection Algorithm

doi:10.3778/j.issn.1002-8331.2105-0337

Abstract

Abstract: SSD（single shot multi-box detector） is a neural network algorithm widely used in pedestrian detection. In order to improve its detection accuracy and detection speed, the SSD has been effectively improved（the improved algorithm is called XSSD-P）. First, the Xception network is selected as the backbone network of the XSSD-P algorithm and the feature layers used for prediction are re-selected. According to the appearance characteristics of pedestrian, multi-scale convolution kernels and basic anchors are designed, and the two are coupled. The basic anchor adjusts its size to obtain anchors for position regression. Finally, depthwise separable convolution is used instead of conventional convolution to predict on the feature map. The above improvement realizes the effective detection of pedestrians. The detection accuracy comparison test is carried out on the INRIA dataset, VOC dataset and COCO dataset. Compared with SSD and other mainstream algorithms, the XSSD-P has higher detection accuracy in pedestrian detection. And the generalization ability of the XSSD-P algorithm is verified in the Caltech pedestrian dataset and the MIT pedestrian dataset. In terms of detection speed, compared with the SSD, the detection speed of the XSSD-P is 30?FPS higher, an increase of 42.86%. Experimental results show that the detection accuracy and detection speed of XSSD-P are better than those of SSD.

Key words: pedestrian detection, single shot multi-box detector（SSD） algorithm, convolutional neural network, multi-scale convolution kernel, Xception network

摘要： SSD（single shot multi-box detector）是目前广泛应用于行人检测的神经网络算法，为了提高其检测精度和检测速度，对SSD算法进行了有效改进（改进后的算法称为XSSD-P）。选择Xception网络作为XSSD-P算法的骨干网络并重新选择用于预测的特征层；根据行人外形尺寸的特征设计了多尺度卷积核和基础锚框，并将二者耦合，基础锚框通过调节自身大小得到锚框（anchors）用于位置回归；再使用深度可分离卷积代替常规卷积在特征图上进行预测，实现了行人的有效检测。在INRIA数据集、VOC数据集和COCO数据集上进行检测精度对比测试，与SSD以及其他主流算法相比，XSSD-P算法在行人检测方面拥有更高的检测精度，并在Caltech行人数据集和MIT行人数据集中验证了XSSD-P算法的泛化性能。在检测速度方面，与SSD算法相比，XSSD-P算法的检测速度高出30?FPS，提高了42.86%。实验结果表明，XSSD-P的检测精度和检测速度均优于SSD算法。

关键词: 行人检测, SSD算法, 卷积神经网络, 多尺度卷积核, Xception网络

BAO Wenbin, ZHANG Dongquan. XSSD-P：Improved SSD Pedestrian Detection Algorithm[J]. Computer Engineering and Applications, 2022, 58(23): 132-141.

鲍文斌, 张冬泉. XSSD-P：改进的SSD行人检测算法[J]. 计算机工程与应用, 2022, 58(23): 132-141.

References

[1] GERONIMO D，LOPEZ A M，SAPPA A D，et al.Survey of pedestrian detection for advanced driver assistance systems[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2009，32（7）：1239-1258.
[2] ZHAO Z Q，ZHENG P，XU S，et al.Object detection with deep learning：a review[J].IEEE Transactions on Neural Networks and Learning Systems，2019，30（11）：3212-3232.
[3] DALAL N，TRIGGS B.Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition（CVPR’05），2005：886-893.
[4] LIENHART R，MAYDT J.An extended set of Haar-like features for rapid object detection[C]//Proceedings International Conference on Image Processing，2002.
[5] FELZENSZWALB P F，GIRSHICK R B，MCALLESTER D，et al.Object detection with discriminatively trained part-based models[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2009，32（9）：1627-1645.
[6] LIU L，OUYANG W，WANG X，et al.Deep learning for generic object detection：a survey[J].International Journal of Computer Vision，2020，128（2）：261-318.
[7] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：580-587.
[8] UIJLINGS J，VAN DE SANDE K，GEVERS T，et al.Selective search for object recognition[J].International Journal of Computer Vision，2013，104（2）：154-171.
[9] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1440-1448.
[10] HE K，ZHANG X，REN S，et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2014，37（9）：1904-1916.
[11] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[12] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision.Cham：Springer，2016：21-37.
[13] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：779-788.
[14] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，Honolulu，Jul 21-26，2017.Washington：IEEE Computer Society，2017：6517-6525.
[15] REDMON J，FARHADI A.YOLOV3：an incremental improvement[J].arXiv：1804.02767，2018.
[16] BOCHKOVSKIY A，WANG C Y，LIAO H.YOLOv4：optimal speed and accuracy of object detection[EB/OL].[2021-05-15].https：//arxiv.org/pdf/2004.10934v1.pdf.
[17] 马原东，罗子江，倪照风，等.改进SSD算法的多目标检测[J].计算机工程与应用，2020，56（23）：23-30.
MA Y D，LUO Z J，NI Z F，et al.Multi-target detection based on improved SSD algorithm[J].Computer Engineering and Applications，2020，56（23）：23-30.
[18] FU C Y，LIU W，RANGA A，et al.DSSD：deconvolutional single shot detector[EB/OL].[2021-05-15].https：//arxiv.org/pdf/1701.06659.pdf.
[19] LI Z，ZHOU F.Fssd：feature fusion single shot multibox detector[J].[EB/OL].[2021-05-15].https：//arxiv.org/abs/1712.00960.pdf.
[20] ZHAO Q，SHENG T，WANG Y，et al.M2det：a single-shot object detector based on multi-level feature pyramid network[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2019，33（1）：9259-9266.
[21] LIU S，HUANG D.Receptive field block net for accurate and fast object detection[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：385-400.
[22] WANG R J，LI X，LING C X.Pelee：a real-time object detection system on mobile devices[C]//Advances in Neural Information Processing Systems，2018：1967-1976.
[23] CHOLLET F.Xception：deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1251-1258.
[24] YANG G，SCHOENHOLZ S.Mean field residual networks：on the edge of chaos[C]//Advances in Neural Information Processing Systems，2017.
[25] HOWARD A G，ZHU M，CHEN B，et al.MobileNets：efficient convolutional neural networks for mobile vision applications[EB/OL].[2021?05?10].https：//arxiv.org/abs/1704.04861.pdf.
[26] LUO W，LI Y，URTASUN R，et al.Understanding the effective receptive field in deep convolutional neural networks[EB/OL].[2021?05?10].https：//arxiv.org/pdf/1701.
04128.pdf.
[27] 周舟，韩芳，王直杰.改进SSD算法在中国手语识别上的应用[J].计算机工程与应用，2021，57（3）：156-161.
ZHOU Z，HAN F，WANG Z J.Application of improved SSD algorithm in Chinese sign language recognition[J].Computer Engineering and Applications，2021，57（3）：156-161.
[28] RAGESH N K，RAJESH R.Pedestrian detection in automotive safety：understanding state-of-the-art[J].IEEE Access，2019，7：47864-47890.