改进YOLOv5的小目标检测算法

doi:10.3778/j.issn.1002-8331.2302-0157

摘要/Abstract

摘要： 虽然现在的深度学习技术在大中目标检测领域取得了惊人的进步，但是由于小目标的尺寸有限以及卷积网络的局限性，导致小目标检测仍然是一个具有挑战性的问题。通过改进YOLOv5算法，提出了一种针对小目标的YOLO-S模型。在原来三层输出层的基础上，利用级联网络，添加一个专门针对于小目标检测的输出层。为了补充上下文信息以及抑制多尺度特征融合冲突，设计了一种新的上下文信息提取模块CFM（Context Feature Module）以及基于通道和空间特征细化的模块FSM（feature specify module）。上采样方式由原来的最邻近插值替换为新设计的Transpose模块，可以将信息最大化恢复。数据集采用专门针对于小目标的VisDrone2019来验证算法的有效性。实验结果表明，YOLO-S比YOLOv5的mAP@0.5提高了6.9个百分点。

关键词: YOLOv5, 小目标检测, 级联网络, 上下文信息, 特征细化

Abstract: Although the current deep learning technology has made amazing progress in the field of large and medium object detection, small object detection is still a challenging problem today due to the limited size of small object and the limitations of convolutional networks. Based on You Only Look Once version 5（hereinafter referred to as YOLOv5） algorithm, this research proposes a YOLO-S model, which is very friendly to small objects. Firstly, on the basis of the orginal output layer with only three layers, a special output layer for small object detection is added by using the cascade network. Secondly, in order to supplement context information and suppress multi-scale feature fusion conflicts, a new supplement context information module CFM and channel and spatial feature thinning module FSM is designed. Finally, the upsampling method is replaced by deconvolution from the original linear interpolation. The dataset uses VisDrone2019, which is specially designed for small objects, to verify the effectiveness of the algorithm. The experimental results show that the mAP@0.5 of YOLO-S is 6.9 percentage points higher than that of YOLOv5.

Key words: You Only Look Once version 5（YOLOv5）, small object, cascade network, context information, feature refinement

俞军, 贾银山. 改进YOLOv5的小目标检测算法[J]. 计算机工程与应用, 2023, 59(12): 201-207.

YU Jun, JIA Yinshan. Improved YOLOv5 for Small Object Detection Algorithm[J]. Computer Engineering and Applications, 2023, 59(12): 201-207.

参考文献

[1] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[2] LIU W，ANGUELOV D，ERHAN D，et al.SSD：Single shot MultiBox detector[C]//European Conference on Computer Vision（ECCV）.Amsterdam：Springer，2016：21-37.
[3] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition，2016.
[4] REDMON J，FARHADI A.YOLOv3：an incremental improvement[EB/OL].（2018-04-08）[2023-02-05].https：//arxiv.org/pdf/1804.02767.pdf.
[5] LIN T Y，DOLLAR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition，2017：936-944.
[6] LIU S，QI L，QIN H F，et al.Path aggregation network for instance segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition，2018：8759-8768.
[7] LIM J S，ASTRID M，YOON H J，et al.Small object detection using context and attention[C]//2021 International Conference on Artificial Intelligence in Information and Communication，2021：181-186.
[8] YANG C，HUANG Z，WANG N.Querydet：cascaded sparse query for accelerating high-resolution small object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2022：13668-13677.
[9] XU C，WANG J，YANG W，et al.RFLA：Gaussian receptive field based label assignment for tiny object detection[C]//17th European Conference on Computer Vision，Tel Aviv，Israel，October 23-27，2022.Cham：Springer，2022：526-543.
[10] CHEN Y K，ZHANG P Z，LI Z M，et al.Dynamic scale training for object detection[EB/OL].（2021-05-14）[2023-02-05].https：//arxiv.org/pdf/2004.12432.pdf.
[11] 葛泽坤，陶发展，付主木，等.改进多头注意力机制的车道检测方法[J/OL].计算机工程与应用：1-10[2023-02-07].http：//kns.cnki.net/kcms/detail/11.2127.TP.20221213.0956.
001.html.
GE Z K，TAO F Z，FU Z M，et al.Lane detection method based on improved multi-head self-attention[J/OL].Computer Engineering and Applications：1-10[2023-02-07].http：//kns.cnki.net/kcms/detail/11.2127.TP.20221213.0956.001.html.
[12] 江波，屈若锟，李彦冬，等.基于深度学习的无人机航拍目标检测研究综述[J].航空学报，2021，42（4）：131-145.
JIANG B，QU R K，LI Y D，et al.Object detection in UAV imagery based on deep learning：review[J].Acta Aeronautica et Astronautica Sinica，2021，42（4）：131-145.
[13] YUAN Y，XIONG Z，WANG Q.VSSA-NET：vertical spatial sequence attention network for traffic sign detection[J].IEEE Transactions on Image Processing，2019，28（7）：3423-3434.
[14] HOCHREITER S，SCHMIDHUBER J.Long short-term memory[J].Neural Computation，1997，9（8）：1735-1780.
[15] PARK D，RAMANAN D，FOWLKES C.Multiresolution models for object detection[C]//11th European Conference on Computer Vision，Heraklion，Crete，Greece，September 5-11，2010.Berlin Heidelberg：Springer，2010：241-254.
[16] GUAN L，WU Y，ZHAO J.Scan：semantic context aware network for accurate small object detection[J].International Journal of Computational Intelligence Systems，2018，11（1）：951-961.
[17] CUI L，MA R，LV P，et al.MDSSD：multi-scale deconvolutional single shot detector for small objects[J].arXiv：1805.07009，2018.
[18] HU G X，YANG Z，HU L，et al.Small object detection with multiscale features[J].International Journal of Digital Multimedia Broadcasting，2018：1-10.
[19] 李小军，邓月明，陈正浩，等.改进YOLOv5的机场跑道异物目标检测算法[J].计算机工程与应用，2023，59（2）：202-211.
LI X J，DENG Y M，CHEN Z H，et al.Improved YOLOv5’s foreign object debris detection algorithm for airport runways[J].Computer Engineering and Applications，2023，59（2）：202-211.
[20] 肖俊杰.基于YOLOv3和YCrCb的人脸口罩检测与规范佩戴识别[J].计算机工程与软件，2020，41（7）：164-169.
XIAO J J.Masked face detection and standard wearingmask recognition based on YOLOv3 and YCrCb[J].Computer Engineering and Software，2020，41（7）：164-169.
[21] DAI Y，GIESEKE F，OEHMCKE S，et al.Attentional fea ture fusion[C]//IEEE/CVF Winter Conference on Applications of Computer Vision，2021：3560-3569.
[22] ZHANG S，WEN L，BIAN X，et al.Single-shot refinement neural network for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：4203-4212.
[23] ZHU Y，ZHAO C，WANG J，et al.CoupleNet：coupling global structure with local parts for object detection[C]//2017 IEEE International Conference on Computer Vision，2017：4146-4154.
[24] ZHU X，SU W，LU L，et al.Deformable detr：deformable transformers for end-to-end object detection[J].arXiv：2010.04159，2020.
[25] ZHU X，LYU S，WANG X，et al.TPH-YOLOv5：improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2021：2778-2788.
[26] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.
[27] ZHAO W，HUANG H，LI D，et al.Pointer defect detection based on transfer learning and improved cascade-RCNN[J].Sensors，2020，20（17）：4939.
[28] MA S，SONG Y，CHENG N，et al.Structured light detection algorithm based on deep learning[C]//IOP Conference Series：Earth and Environmental Science，2019.
[29] SHI Z.Object detection algorithms：a comparison[C]//2022 IEEE 4th International Conference on Civil Aviation Safety and Information Technology（ICCASIT），2022：861-865.