适合车载边缘计算的拥挤行人检测算法

doi:10.3778/j.issn.1002-8331.2108-0456

摘要/Abstract

摘要： 针对目前车载计算单元的计算资源和计算能力有限，不能运行网络层次较深的目标检测算法，设计了一种轻量化的网络模型用于对拥挤行人场景的检测，将Darknet53骨干网络替换为GhostNet，通过引入线性计算获得与普通卷积相似的特征图来减少计算资源消耗；引入空间金字塔池化模块实现多尺度融合，加强特征提取；提出使用更加高效的搜索机制改进卷积块注意力机制模块，联合分类网络AlexNet对自适应搜索广度[k]值进行选取，进一步提高网络性能；采用Grad-CAM算法将网络模型实现热力图可视化来对注意力机制进行分析；引入CIOU损失函数实现真实框和预测值在中心点上的拟合，以此来加速模型收敛和实现更加精确的定位。研究结果表明：改进后的网络在WiderPerson行人检测数据集上行人类别查准率达到75.35%，相比于改进前的模型在行人查准率和平均查准率上分别提高了5.76个百分点和3.28个百分点。在Visdrone数据集上，改进后的网络平均查准率达到35.6%，在基本接近于YOLOv3的基础上，每秒检测图片的数量可以达到60张，相较于传统的单阶段检测算法，检测速率最高提升了52.1%，能满足移动设备以及车载计算对实时检测速度和精度的要求。

关键词: 行人检测, 神经网络, 注意力机制, 轻量化

Abstract: In view of the limited computing resources and computing power of the current vehicle computing unit, which can not run the target detection algorithm with deep network level, a lightweight network model is designed to detect the crowded pedestrian scene. The Darknet53 backbone network is replaced by GhostNet, and the characteristic graph similar to the ordinary convolution is obtained through cheap linear calculation to reduce the computational overhead. The spatial pyramid pooling module is introduced to realize multi-scale fusion and strengthen feature extraction. A more efficient search mechanism is proposed to improve the convolution block attention mechanism module, and the adaptive search breadth [k] value is selected jointly with the classification network AlexNet to further improve the network performance. Grad-CAM algorithm is used to visualize the network model to analyze the attention mechanism. The CIOU loss function is introduced to realize the fitting of the real box and the predicted value on the central point, so as to accelerate the model convergence and achieve more accurate positioning. The experimental results show that the improved network achieves 75.35% human precision in the WiderPerson pedestrian detection dataset, and improves the pedestrian precision and average precision by 5.76 percentage points and 3.28 percentage points respectively compared with the model before improvement. On the Visdrone dataset, the average precision of the improved network reaches 35.6%. On the basis of being basically close to YOLOv3, the number of detected pictures per second can reach 60. Compared with the traditional single-stage detection algorithm, the detection rate is up to 52.1%, which can meet the requirements of mobile equipment and on-board computing for real-time detection speed and accuracy.

Key words: pedestrian detection, neural network, attention mechanism, lightweight

帅泽群, 李军, 张世义. 适合车载边缘计算的拥挤行人检测算法[J]. 计算机工程与应用, 2023, 59(4): 156-164.

SHUAI Zequn, LI Jun, ZHANG Shiyi. Crowded Pedestrian Detection Algorithm Suitable for Vehicle Edge Computing[J]. Computer Engineering and Applications, 2023, 59(4): 156-164.

参考文献

[1] ZOU Z，SHI Z，GUO Y，et al.Object detection in 20 years：a survey[J].arXiv：1905.05055，2019.
[2] 孙锐，王慧慧，叶子豪.融合深度感知特征与核极限学习机的行人检测[J].电子测量与仪器学报，2019，33（2）：39-47.
SUN R，WANG H H，YE Z H.Pedestrian detection based on combining depth perception features with kernel extreme learning machine[J].Journal of Electronic Measurement and Instrumentation，2019，33（2）：39-47.
[3] 周大可，宋荣，杨欣.结合双重注意力机制的遮挡感知行人检测[J].哈尔滨工业大学学报，2021，53（9）：156-163.
ZHOU D K，SONG R，YANG X.Occlusion-aware pedestrian detection combined with dual attention mechanism[J].Journal of Harbin Institute of Technology，2021，53（9）：156-163.
[4] 刘丽，郑洋，付冬梅.改进YOLOv3网络结构的遮挡行人检测算法[J].模式识别与人工智能，2020，33（6）：568-574.
LIU L，ZHENG Y，FU D M.Occluded pedestrian detection algorithm based on improved network structure of YOLOv3[J].Pattern Recognition and Artificial Intelligence，2020，33（6）：568-574.
[5] CHU X，ZHENG A，ZHANG X，et al.Detection in crowded scenes：one proposal，multiple predictions[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：12214-12223.
[6] LIN B，LIN K，LIN C，et al.Computation offloading strategy based on deep reinforcement learning for connected and autonomous vehicle in vehicular edge computing[J].Journal of Cloud Computing，2021，10（1）：1-17.
[7] WU Y，CHEN Y，YUAN L，et al.Rethinking classification and localization for object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：10186-10195.
[8] HSU S C，HUANG C L，CHUANG C H.Vehicle detection using simplified fast R-CNN[C]//2018 International Workshop on Advanced Image Technology，2018：1-3.
[9] REDMON J，FARHADI A.YOLOv3：an incremental improvement[J].arXiv：1804.02767，2018.
[10] ZHANG Z，HE T，ZHANG H，et al.Bag of freebies for training object detection neural networks[J].arXiv：1902.
04103，2019.
[11] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//14th European Conference on Computer Vision.Cham：Springer，2016：21-37.
[12] ZHANG J，HUANG M，JIN X，et al.A real-time Chinese traffic sign detection algorithm based on modified YOLOv2[J].Algorithms，2017，10（4）：127.
[13] 宋艳艳，谭励，马子豪，等.改进YOLOV3算法的视频目标检测[J].计算机科学与探索，2021，15（1）：163-172.
SONG Y Y，TAN L，MA Z H，et al.Video target detection based on improved YOLOV3 algorithm[J].Journal of Frontiers of Computer Science and Technology，2021，15（1）：163-172.
[14] 郑远攀，李广阳，李晔.深度学习在图像识别中的应用研究综述[J].计算机工程与应用，2019，55（12）：20-36.
ZHENG Y P，LI G Y，LI Y.Survey of application of deep learning in image recognition[J].Computer Engineering and Applications，2019，55（12）：20-36.
[15] 董小伟，韩悦，张正，等.基于多尺度加权特征融合网络的地铁行人目标检测算法[J].电子与信息学报，2021，43（7）：2113-2120.
DONG X W，HAN Y，ZHANG Z，et al.Metro pedestrian target detection algorithm based on multi-scale weighted feature fusion network[J].Journal of Electronics and Information，2021，43（7）：2113-2120.
[16] GHIASI G，LIN T Y，LE Q V.NAS-FPN：learning scalable feature pyramid architecture for object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：7036-7045.
[17] 帅泽群，李军.基于深度学习的目标检测研究[J].汽车工程师，2021（5）：11-14.
SHUAI Z Q，LI J.Research on object detection based on deep learning[J].Automotive Engineer，2021（5）：11-14.
[18] HAN K，WANG Y，TIAN Q，et al.GhostNet：more features from cheap operations[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：1580-1589.
[19] HE K，ZHANG X，REN S，et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2015，37（9）：1904-1916.
[20] WOO S，PARK J，LEE J Y，et al.CBAM：convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision，2018：3-19.
[21] 陈维婧，周萍，杨海燕，等.通道-空间联合注意力机制的显著性检测模型[J].计算机工程与应用，2021，57（19）：214-219.
CHEN W J，ZHOU P，YANG H Y，et al.Salient detection model based on channel-spatial joint attention mechanism[J].Computer Engineering and Application，2021，57（19）：214-219.
[22] 宋中山，梁家锐，郑禄，等.基于双向门控尺度特征融合的遥感场景分类[J].计算机应用，2021，41（9）：2726-2735.
SONG Z S，LIANG J R，ZHENG L，et al.Remote sensing scene classification based on bidirectional gated scale feature fusion[J].Journal of Computer Application，2021，41（9）：2726-2735.
[23] WANG Q L，WU B G，ZHU P F，et al.ECA-Net：efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：11534-11542.
[24] 姚红革，张玮，杨浩琪，等.深度强化学习联合回归目标定位[J/OL].自动化学报[2021-08-04].http：//202.202.244.12：80/rwt/CNKI/https/MSYXTLUQPJUB/10.16383/j.aas.200045.
YAO H G，ZHANG W，YANG H Q，et al.Joint regression object localization based on deep reinforcement learning[J/OL].Acta Automatica Sinica[2021-08-04].http：//202.
202.244.12：80/rwt/CNKI/https/MSYXTLUQPJUB/10.16383/
j.aas.c200045.
[25] ZHU P，WEN L，DU D，et al.VisDrone-DET2018：the vision meets drone object detection in image challenge results[C]//Proceedings of the 15th European Conference on Computer Vision Workshops，2018.
[26] SELVARAJU R R，COGSWELL M，DAS A，et al.Grad-CAM：visual explanations from deep networks via gradient-based localization[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision，2017：618-626.
[27] SELVARAJU R R，DAS A，VEDANTAM R，et al.Grad-CAM：why did you say that?[J].arXiv：1611.07450，2016.