基于改进YOLOv5的目标检测算法研究

doi:10.3778/j.issn.1002-8331.2202-0093

摘要/Abstract

摘要： YOLOv5是目前单阶段目标检测性能较好的算法，但对目标边界回归的精确度不高，难以适用对预测框交并比要求较高的场景。基于YOLOv5算法，提出一种对硬件要求低、模型收敛速度快、目标框准确率高的新模型YOLO-G。改进特征金字塔结构（FPN），采用跨层级联的方式融合更多的特征，一定程度上防止了浅层语义信息的丢失，同时加深金字塔深度，对应增加检测层，使各种锚框的铺设间隔更加合理；其次把并行模式的注意力机制融入到网络结构中，赋予空间注意力模块和通道注意力模块相同的优先级，以加权融合的方式提取注意力信息，使网络可根据对空间和通道注意力的关注程度得到混合域注意力；通过降低网络的参数量和计算量对网络进行轻量化处理，防止因模型复杂度提升造成实时性能的损失。使用PASCAL VOC的2007、2012两个数据集来验证算法的有效性，YOLO-G比YOLOv5s的参数量减少了4.7%，计算量减少了47.9%，而mAP@0.5提高了3.1个百分点，mAP@0.5：0.95提高了5.6个百分点。

关键词: YOLOv5算法, 特征金字塔（FPN）, 注意力机制, 目标检测

Abstract: YOLOv5 is an algorithm with good performance in single-stage target detection at present, but the accuracy of target boundary regression is not too high, so it is difficult to apply to scenarios with high requirements on the intersection ratio of prediction boxes. Based on YOLOv5 algorithm, this paper proposes a new model YOLO-G with low hardware requirements, fast model convergence and high accuracy of target box. Firstly, the feature pyramid network（FPN） is improved, and more features are integrated in the way of cross-level connection, which prevents the loss of shallow semantic information to a certain extent. At the same time, the depth of the pyramid is deepened, corresponding to the increase of detection layer, so that the laying interval of various anchor frames is more reasonable. Secondly, the attention mechanism of parallel mode is integrated into the network structure, which gives the same priority to spatial and channel attention module, then the attention information is extracted by weighted fusion, so that the network can fuse the mixed domain attention according to the attention degree of spatial and channel attention. Finally, in order to prevent the loss of real-time performance due to the increase of model complexity, the network is lightened to reduce the number of parameters and computation of the network. PASCAL VOC datasets of 2007 and 2012 are used to verify the effectiveness of the algorithm. Compared with YOLOv5s, YOLO-G reduces the number of parameters by 4.7% and the amount of computation by 47.9%, while mAP@0.5 and mAP@0.5：0.95 increases by 3.1 and 5.6 percentage points respectively.

Key words: YOLOv5 algorithm, feature pyramid network（FPN）, attention mechanism, object detection

邱天衡, 王玲, 王鹏, 白燕娥. 基于改进YOLOv5的目标检测算法研究[J]. 计算机工程与应用, 2022, 58(13): 63-73.

QIU Tianheng, WANG Ling, WANG Peng, BAI Yan’e. Research on Object Detection Algorithm Based on Improved YOLOv5[J]. Computer Engineering and Applications, 2022, 58(13): 63-73.

参考文献

[1] 罗会兰，陈鸿坤.基于深度学习的目标检测研究综述[J].电子学报，2020，48（6）：1230-1239.
LUO H L，CHEN H K.Survey of object detection based on deep learning[J].Acta Electronica Sinica，2020，48（6）：1230-1239.
[2] 王迪聪，白晨帅，邬开俊.基于深度学习的视频目标检测综述[J].计算机科学与探索，2021，15（9）：1563-1577.
WANG D　C，BAI C　S，WU K　J.Survey of video object detection based on deep learning[J].Journal of Frontiers of Computer Science and Technology，2021，15（9）：1563-1577.
[3] GIRSHICK R.Fast R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision（ICCV），2015：1440-1448.
[4] HE K，GKIOXARI G，DOLLáR P，et al.Mask R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision（ICCV），2017：2980-2988.
[5] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：Unified，real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2016：779-788.
[6] 王燕妮，余丽仙.注意力与多尺度有效融合的SSD目标检测算法[J].计算机科学与探索，2022，16（2）：438-447.
WANG Y N，YU L X.SSD object detection algorithm with effective fusion of attention and multiscale[J].Journal of Frontiers of Computer Science and Technology，2022，16（2）：438-447.
[7] 沈震宇，朱昌明，王喆.基于MAML算法的YOLOv3目标检测模型[J].华东理工大学学报（自然科学版），2022，48（1）：112-119.
SHEN Z Y，ZHU C M，WANG Z.YOLOv3 object detection model based on MAML algorithm[J].Journal of East China University of Science and Technology，2022，48（1）：112-119.
[8] 谭显东，彭辉.改进YOLOv5的SAR图像舰船目标检测[J].计算机工程与应用，2022，58（4）：247-254.
TAN X D，PENG H.Improved YOLOv5 ship target detection in SAR image[J].Computer Engineering and Applications，2022，58（4）：247-254.
[9] 王兵，乐红霞，李文璟，等.改进YOLO轻量化网络的口罩检测算法[J].计算机工程与应用，2021，57（8）：62-69.
WANG B，LE H　X，LI W　J，et al.Mask detection algorithm based on improved YOLO lightweight network[J].Computer Engineering and Applications，2021，57（8）：62-69.
[10] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409.1556，2014.
[11] HE K，ZHANG X，SUN S R A J.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2016：770-778.
[12] HOWARD A G.MobileNets：Efficient convolutional neural networks for mobile vision applications[J].arXiv：1704. 04861，2017.
[13] ZHANG X，ZHOU X，SUN M L.ShuffleNet：An extr-emely efficient convolutional neural network for mobile devices[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：6848-6856.
[14] HAN K，WANG Y，TIAN Q，et al.GhostNet：More features from cheap operations[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020：1577-1586.
[15] 杨小冈，高凡，卢瑞涛，等.基于改进YOLOv5的轻量化航空目标检测方法[J/OL].信息与控制：1-7（2021-09-27）[2022-01-22].http：//kns.cnki.net/kcms/detail/21.1138.TP.
20210927.1729.002.htm.
YANG X G，GAO F，LU R T，et al.Lightweight aerial object detection method based on improved YOLOv5[J/OL].Information and Control：1-7（2021-09-27）[2022-01-22].http：//kns.cnki.net/kcms/detail/21.1138.TP.20210927.1729.
002.htm.
[16] 林森，刘美怡，陶志勇.采用注意力机制与改进YOLOv5的水下珍品检测[J].农业工程学报，2021，37（18）：307-314.
LIN S，LIU M　Y，TAO Z　Y.Detection of underwater treasures using attention mechanism and improved YOLOv5[J].Transactions of the Chinese Society of Agricultural Engineering，2021，37（18）：307-314.
[17] 彭成，张乔虹，唐朝晖，等.基于YOLOv5增强模型的口罩佩戴检测方法研究[J].计算机工程，2022（4）：39-49.
PENG C，ZHANG Q H，TANG Z H，et al.A face mask wearing detection method based on YOLOv5 enhancement model[J].Computer Engineering，2022（4）：39-49.
[18] 钱坤，李晨瑄，陈美杉，等.基于YOLOv5的舰船目标及关键部位检测算法[J].系统工程与电子技术，2022（6）：1823-1832.
QIAN K，LI C X，CHEN M S，et al.Ship target and key parts detection algorithm based on YOLOv5[J].Systems Engineering and Electronics，2022（6）：1823-1832.
[19] LIU S，QI L，QIN H，et al.Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：8759-8768.
[20] TAN M X，PANG R.EfficientDet：Scalable and efficient object detection[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020：10778-10787.