计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (23): 145-153.DOI: 10.3778/j.issn.1002-8331.2207-0496

• 模式识别与人工智能 • 上一篇    下一篇

改进型YOLOv4-tiny的轻量级目标检测算法

郭明镇,汪威,申红婷,候红涛,刘宽,罗子江   

  1. 贵州财经大学 信息学院,贵阳 550025
  • 出版日期:2023-12-01 发布日期:2023-12-01

Improved Lightweight Target Detection Algorithm for YOLOv4-tiny

GUO Mingzhen, WANG Wei, SHEN Hongting, HOU Hongtao, LIU Kuan, LUO Zijiang   

  1. School of Information, Guizhou University of Finance and Economics, Guiyang 550025, China
  • Online:2023-12-01 Published:2023-12-01

摘要: 为解决部署在嵌入式设备上的目标检测中特征提取速度较慢、检测实时性不足和算法移植性较差的问题,以YOLOv4-tiny为基准网络,提出一种基于CSPRDWConv(cross stage partial residual depthwise convolution)模块的轻量级网络YOLOv4-tiny-CSPRDWConv,并使用改进的Mosaic数据增强来提升检测模型精度。CSPRDWConv模块中适当缩减算力规模,使得整个模块在保持精度的同时大幅提升推理速度;改进的Mosaic数据增强方法,节省数据增强进程的时间,充分利用每个图像块,并且过滤掉物体过小的目标,使得模型更易于训练。在此基础之上,主干网络的卷积层全部选用小卷积核,只在最后一次压缩特征图时使用5×5的深度可分离卷积,以确保模型低延迟和高准确度的特性;在Neck中引入弱SPP模块,利用局部特征和全局特征来提高目标检测的精度;通过NEON指令对训练后的检测模型进行优化,将卷积层与BN层融合,加快模型的推理进程。改进的YOLOv4-tiny算法在1080Ti的硬件上达到1?308?FPS的实时检测速度,在RK3288开发板上的推理速度约为8?FPS,检测速度约为YOLOv4-tiny基准网络的4倍;mAP达到22.31%,相比于基准网络提升0.61个百分点。实验结果表明,改进的YOLOv4-tiny算法在嵌入式设备上的检测效果更为流畅和高效。

关键词: 目标检测, YOLOv4-tiny, 嵌入式系统, CSPRDWConv模块, Mosaic数据增强

Abstract: In order to solve the problems of slow feature extraction, insufficient detection real-time and poor algorithm portability in target detection deployed on embedded devices, a lightweight network,YOLOv4-tiny-CSPRDWConv, based on the CSPRDWConv module and improved Mosaic is proposed with YOLOv4-tiny as the benchmark network. The improved Mosaic data enhancement method saves time in the data enhancement process, makes full use of each image block, and filters out targets with too small objects, making the model easier to train. On top of this, all the convolutional layers of the backbone network are selected with small convolutional kernels, and only a 5×5 depth separable convolution is used in the last compression of feature map to ensure the low latency and high accuracy of the model,and then BN layers are fused to speed up the inference process of the model. The improved YOLOv4-tiny algorithm achieves a real-time detection speed of 1 308 FPS on 1080Ti hardware, and the inference speed on the RK3288 development board is about 8 FPS, which is nearly four times faster than YOLOv4-tinybenchmark network, and the mAP reaches 22.31%, an improvement of 0.61?percentage points in comparison with the benchmark network. Experimental results show that the improved YOLOv4-tiny algorithm provides smoother and more efficient detection on embedded devices.

Key words: object detection, YOLOv4-tiny, embedded systems, CSPRDWConv module, Mosaic data enhancement