计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (4): 156-164.DOI: 10.3778/j.issn.1002-8331.2108-0456

• 模式识别与人工智能 • 上一篇    下一篇

适合车载边缘计算的拥挤行人检测算法

帅泽群,李军,张世义   

  1. 重庆交通大学 机电与车辆工程学院,重庆 400041
  • 出版日期:2023-02-15 发布日期:2023-02-15

Crowded Pedestrian Detection Algorithm Suitable for Vehicle Edge Computing

SHUAI Zequn, LI Jun, ZHANG Shiyi   

  1. School of Electromechanical and Vehicle Engineering, Chongqing Jiaotong University, Chongqing 400041, China
  • Online:2023-02-15 Published:2023-02-15

摘要: 针对目前车载计算单元的计算资源和计算能力有限,不能运行网络层次较深的目标检测算法,设计了一种轻量化的网络模型用于对拥挤行人场景的检测,将Darknet53骨干网络替换为GhostNet,通过引入线性计算获得与普通卷积相似的特征图来减少计算资源消耗;引入空间金字塔池化模块实现多尺度融合,加强特征提取;提出使用更加高效的搜索机制改进卷积块注意力机制模块,联合分类网络AlexNet对自适应搜索广度[k]值进行选取,进一步提高网络性能;采用Grad-CAM算法将网络模型实现热力图可视化来对注意力机制进行分析;引入CIOU损失函数实现真实框和预测值在中心点上的拟合,以此来加速模型收敛和实现更加精确的定位。研究结果表明:改进后的网络在WiderPerson行人检测数据集上行人类别查准率达到75.35%,相比于改进前的模型在行人查准率和平均查准率上分别提高了5.76个百分点和3.28个百分点。在Visdrone数据集上,改进后的网络平均查准率达到35.6%,在基本接近于YOLOv3的基础上,每秒检测图片的数量可以达到60张,相较于传统的单阶段检测算法,检测速率最高提升了52.1%,能满足移动设备以及车载计算对实时检测速度和精度的要求。

关键词: 行人检测, 神经网络, 注意力机制, 轻量化

Abstract: In view of the limited computing resources and computing power of the current vehicle computing unit, which can not run the target detection algorithm with deep network level, a lightweight network model is designed to detect the crowded pedestrian scene. The Darknet53 backbone network is replaced by GhostNet, and the characteristic graph similar to the ordinary convolution is obtained through cheap linear calculation to reduce the computational overhead. The spatial pyramid pooling module is introduced to realize multi-scale fusion and strengthen feature extraction. A more efficient search mechanism is proposed to improve the convolution block attention mechanism module, and the adaptive search breadth [k] value is selected jointly with the classification network AlexNet to further improve the network performance. Grad-CAM algorithm is used to visualize the network model to analyze the attention mechanism. The CIOU loss function is introduced to realize the fitting of the real box and the predicted value on the central point, so as to accelerate the model convergence and achieve more accurate positioning. The experimental results show that the improved network achieves 75.35% human precision in the WiderPerson pedestrian detection dataset, and improves the pedestrian precision and average precision by 5.76 percentage points and 3.28 percentage points respectively compared with the model before improvement. On the Visdrone dataset, the average precision of the improved network reaches 35.6%. On the basis of being basically close to YOLOv3, the number of detected pictures per second can reach 60. Compared with the traditional single-stage detection algorithm, the detection rate is up to 52.1%, which can meet the requirements of mobile equipment and on-board computing for real-time detection speed and accuracy.

Key words: pedestrian detection, neural network, attention mechanism, lightweight