基于SSD的轻量级车辆检测网络改进

doi:10.3778/j.issn.1002-8331.2011-0018

摘要/Abstract

摘要： 针对嵌入式摄像设备在执行目标检测任务过程中，对于移动中车辆的检测耗时较长无法及时反馈检测结果的问题，提出了一种基于残差连接和注意力机制的轻量级卷积网络来对SSD（single shot multibox detector）目标检测模型进行改进。采用h-swish和h-sigmoid激活函数分别替换残差块中的ReLU激活函数和通道注意力模块中的sigmoid激活函数，降低训练和推理所需计算量。根据实际应用中特定角度下车辆外形的特征为依据，重新设计SSD目标检测方法的默认框生成比例，并结合输入图像大小及特征图感受野来减少特征融合层及默认框匹配运算量。实验表明改进后的SSD检测模型在BIT-Vehicle Dataset上的平均精度均值（mean average precision，mAP）达到了94.87%，相较于经典SSD目标检测模型的mAP提升了0.83个百分点，在搭载了Intel NCS2的Raspbery PI 3+上平均处理速度达到了16?frame/s。

关键词: 车辆检测, 机器视觉, SSD模型, 深度学习, 树莓派

Abstract: Aiming at the problem that the detection of moving vehicles takes a long time and the detection results cannot be timely feedback during the execution of target detection task by embedded camera equipment, a lightweight convolutional network based on residual connection and attention mechanism is proposed to improve the single shot multibox detector（SSD） target detection model. Firstly, h-swish and h-sigmoid activation functions are used to replace ReLU activation function in the residual block and sigmoid activation function in the channel attention module respectively, to reduce the computation required for training and reasoning. Secondly, based on the shape characteristics of vehicles at a specific angle in practical application, the default box generation ratio of SSD target detection method is redesigned. What’s more, the computation of feature fusion layer and default box matching is reduced by combining the input image size and the effective receptive field of feature map. Experimental results show that the mean average precision（mAP） of the improved SSD detection model on the BIT-Vehicle Dataset reaches 94.87%, which is 0.83 percentage points higher than that of classic SSD target detection model mAP, and the average processing speed reaches 16 frames per second on the Raspberry PI 3+ equipped with Intel NCS2.

Key words: vehicle detection, machine vision, single shot multibox detector model（SSD）, deep learning, Raspbery PI

徐浩, 杨德刚, 蒋倩倩, 何林晋. 基于SSD的轻量级车辆检测网络改进[J]. 计算机工程与应用, 2022, 58(12): 209-217.

XU Hao, YANG Degang, JIANG Qianqian, HE Linjin. Improvement of Lightweight Vehicle Detection Network Based on SSD[J]. Computer Engineering and Applications, 2022, 58(12): 209-217.

参考文献

[1] DALAL N，TRIGGS B.Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition（CVPR’05），2005：886-893.
[2] LOWE D G.Object recognition from local scale-invariant features[C]//Proceedings of the Seventh IEEE International Conference on Computer Vision，1999：1150-1157.
[3] SáNCHEZ A V D.Advanced support vector machines and kernel methods[J].Neurocomputing，2003，55（1/2）：5-20.
[4] FRIEDMAN J H.Greedy function approximation：a gradient boosting machine[J].The Annals of Statistics，2001，29（5）：1189-1232.
[5] FERREIRA A J，FIGUEIREDO M A T.Boosting algorithms：a review of methods，theory，and applications[M]//Ensemble machine learning.Boston，MA：Springer，2012：35-85.
[6] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM，2017，60（6）：84-90.
[7] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[8] HE K，ZHANG X，REN S，et al.Identity mappings in deep residual networks[C]//European Conference on Computer Vision.Cham：Springer，2016：630-645.
[9] HU J，SHEN L，ALBANIE S，et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42（8）.
[10] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[J].arXiv：1512.02325，2015.
[11] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[12] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409.
1556，2014.
[13] LIN M，CHEN Q，YAN S.Network in network[J].arXiv：1312.4400，2013.
[14] GLOROT X，BORDES A，BENGIO Y.Deep sparse rectifier neural networks[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics，2011：315-323.
[15] HOWARD A，SANDLER M，CHU G，et al.Searching for mobilenetv3[C]//Proceedings of the IEEE International Conference on Computer Vision，2019：1314-1324.
[16] RAMACHANDRAN P，ZOPH B，LE Q V.Searching for activation functions[J].arXiv：1710.05941，2017.
[17] IOFFE S，SZEGEDY C.Batch normalization：sccelerating deep network training by reducing internal covariate shift[J].arXiv：1502.03167，2015.
[18] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）.
[19] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.
[20] 张洋硕，苗壮，王家宝，等.基于Movidius神经计算棒的行人检测方法[J].计算机应用，2019，39（8）：2230-2234.
ZHANG Y S，MIAO Z，WANG J B，et al.Pedestrian detection method based on Movidius neural computing stick[J].Journal of Computer Applications，2019，39（8）：2230-2234.
[21] DONG Z，WU Y W，PEI M T，et al.Vehicle type classification using a semisupervised convolutional neural network[J].IEEE Transactions on Intelligent Transportation Systems，2015，16（4）：2247-2256.