Research on Improving YOLOv7’s Small Target Detection Algorithm

doi:10.3778/j.issn.1002-8331.2307-0004

Abstract

Abstract: With the continuous application of deep learning in domestic object detection, conventional large and medium object detection has made astonishing progress. However, due to the limitations of convolutional networks themselves, there are still issues of missed and false detections in small object detection. Taking dataset Visdrone 2019 and dataset FloW-Img as examples, the YOLOv7 model is studied, and the ELAN module of the backbone network is improved in the network structure. The Focal NeXt block is integrated into the long and short gradient paths of the ELAN module to enhance the feature quality of small targets and improve the contextual information content contained in the output features. The RepLKDeXt module is introduced into the head network, which not only replaces the SPPCSPC module to simplify the overall structure of the model, but also optimizes the ELAN-H structure using multi-channel, large convolutional kernels, and Cat operations. Finally, the SIOU loss function is introduced to replace the CIOU function to improve the robustness of the model. The results show that the improved YOLOv7 model reduces the number of parameters and computational complexity, and its detection performance remains approximately unchanged on the Visdrone 2019 dataset with high small target density. It increases by 9.05 percentage points on the sparse FloW-Img dataset with small targets, further simplifying the model and increasing its applicability.

Key words: YOLOv7 model, small target detection, large convolutional kernels, loss function

摘要： 随着深度学习在国内目标检测的不断应用，常规的大、中目标检测已经取得惊人的进步，但由于卷积网络本身的局限性，针对小目标检测依然会出现漏检、误检的问题，以数据集Visdrone2019和数据集FloW-Img为例，对YOLOv7模型进行研究，在网络结构上对骨干网的ELAN模块进行改进，将Focal NeXt block加入到ELAN模块的长短梯度路径中融合来强化输出小目标的特征质量和提高输出特征包含的上下文信息含量，在头部网络引入RepLKDeXt模块，该模块不仅可以取代SPPCSPC模块来简化模型整体结构还可以利用多通道、大卷积核和Cat操作来优化ELAN-H结构，最后引入SIOU损失函数取代CIOU函数以此提高该模型的鲁棒性。结果表明改进后的YOLOv7模型参数量减少计算复杂性降低并在小目标密度高的Visdrone 2019数据集上的检测性能近似不变，在小目标稀疏的FloW-Img数据集上涨幅9.05个百分点，进一步简化了模型并增加了模型的适用范围。

关键词: YOLOv7模型, 小目标检测, 大卷积核, 损失函数

LI Anda, WU Ruiming, LI Xudong. Research on Improving YOLOv7’s Small Target Detection Algorithm[J]. Computer Engineering and Applications, 2024, 60(1): 122-134.

李安达, 吴瑞明, 李旭东. 改进YOLOv7的小目标检测算法研究[J]. 计算机工程与应用, 2024, 60(1): 122-134.

References

[1] 谷永立, 宗欣欣. 基于深度学习的目标检测研究综述[J]. 现代信息科技, 2022, 6(11): 76-81.
GU Y L, ZONG X X. A review of object detection study based on deep learning[J]. Modern Information Technology, 2022, 6(11): 76-81.
[2] 侯学良, 单腾飞, 薛靖国. 深度学习的目标检测典型算法及其应用现状分析[J]. 国外电子测量技术, 2022, 41(6): 165-174.
HOU X L, SHAN T F, XUE J G. Analysis of typical target detection algorithm based on deep learning and its application status[J]. Foreign Electronic Measurement Technology, 2022, 41(6): 165-174.
[3] 朱豪, 周顺勇, 刘学, 等. 基于深度学习的单阶段目标检测算法综述[J]. 工业控制计算机, 2023, 36(4): 101-103.
ZHU H, ZHOU S Y, LIU X, et al. Survey of single-stage object detection algorithms based on deep learning[J]. Industrial Control Computer, 2023, 36(4): 101-103.
[4] CHEN C, LIU M Y, TUZEL O, et al. R-CNN for small object detection[C]//13th Asian Conference on Computer Vision Computer Vision (ACCV 2016), Taipei, China, November 20-24, 2016. [S.l.]: Springer International Publishing, 2017: 214-230.
[5] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//13th European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, September 6-12, 2014. [S.l.]: Springer International Publishing, 2014: 740-755.
[6] 戚玲珑, 高建瓴. 基于改进YOLOv7的小目标检测[J]. 计算机工程, 2023, 49(1): 41-48.
QI L L, GAO J L. Small object detection based on improved YOLOv7[J]. Computer Engineering, 2023, 49(1): 41-48.
[7] 陈富荣, 肖明明. 基于YOLOv5的改进小目标检测算法研究[J]. 现代信息科技, 2023, 7(3): 55-60.
CHEN F R, XIAO M M. Research on improved algorithm of small target detection based on YOLOv5[J]. Modern Information Technology, 2023, 7(3): 55-60.
[8] 韩俊, 袁小平, 王准, 等. 基于YOLOv5s的无人机密集小目标检测算法[J]. 浙江大学学报 (工学版), 2023, 57(6): 1224-1233.
HAN J, YUAN X P, WANG Z, et al. UAV dense small target detection algorithm based on YOLOv5s[J]. Journal of Zhejiang University (Engineering Science), 2023, 57(6): 1224-1233.
[9] 张徐, 朱正为, 郭玉英, 等. 基于cosSTR-YOLOv7的多尺度遥感小目标检测[J/OL]. 电光与控制: 1-9[2023-08-04]. http://kns.cnki.net/kcms/detail/41.1227.tn.20230615.1017.
002.html.
ZHANG X, ZHU Z W, GUO Y Y, et al. Multi-scale remote sensing small target detection based on cosSTR-YOLOv7[J/OL]. Electronics Optics & Control: 1-9[2023-08-04]. http://kns.cnki.net/kcms/detail/41.1227.tn.20230615.1017.002.html.
[10] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[11] ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2778-2788.
[12] ZHANG G, LI Z, LI J, et al. Cfnet: cascade fusion network for dense prediction[J]. arXiv:2302.06052, 2023.
[13] LIU Z, MAO H, WU C Y, et al. A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 11976-11986.
[14] DING X, ZHANG X, HAN J, et al. Scaling up your kernels to 31x31: revisiting large kernel design in CNNs[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 11963-11975.
[15] GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression[J]. arXiv:2205.12740, 2022.
[16] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv:2207.02696, 2022.
[17] DING X, ZHANG X, MA N, et al. RepVGG: making VGG-style ConvNets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13733-13742.
[18] 俞军, 贾银山. 改进YOLOv5的小目标检测算法[J]. 计算机工程与应用, 2023, 59(12): 201-207.
YU J, JIA Y S. Improved YOLOv5 for small object detection algorithm[J]. Computer Engineering and Applications, 2023, 59(12): 201-207.
[19] WANG S C, ZHU R G, HUANG Z T, et al. Synergetic application of thermal imaging and CCD imaging techniques to detect mutton adulteration based on data-level fusion and deep residual network[J]. Meat Science, 2023, 204: 109281.
[20] HAN Q, FAN Z, DAI Q, et al. On the connection between local attention and dynamic depth-wise convolution[J]. arXiv:2106.04263, 2021.
[21] 赵春江, 梁雪文, 于合龙, 等. 基于改进YOLO v7的笼养鸡/蛋自动识别与计数方法[J]. 农业机械学报, 2023, 54(7): 300-312.
ZHAO C J, LIANG X W, YU H L, et al. Automatic identification and counting method of caged hens and eggs based on improved YOLO v7[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(7): 300-312.
[22] 郑世杰, 王高才. 基于ConvNeXt热图定位和对比学习的细粒度图像分类研究[J]. 计算机科学, 2023, 50(10): 119-125.
ZHENG S J, WANG G C. Study on fine-grained image classification based on ConvNeXt heatmap localization and contrastive learning[J]. Computer Science, 2023, 50(10): 119-125.
[23] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
[24] 陈鸿坤, 罗会兰. 多尺度语义信息融合的目标检测[J]. 电子与信息学报, 2021, 43(7): 2087-2095.
CHEN H K, LUO H L. Multi-scale semantic information fusion for object detectio[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2087-2095.
[25] 马明旭, 马宏, 宋华伟. 基于YOLO-Pose的城市街景小目标行人姿态估计算法[J/OL]. 计算机工程: 1-11[2023-09-03]. https://doi.org/10.19678/j.issn.1000-3428.0067733.
MA M X, MA H, SONG H W. Pose estimation algorithm for small target pedestrians in urban street view based on YOLO-Pose[J/OL]. Computer Science: 1-11[2023-09-03]. https://doi.org/10.19678/j.issn.1000-3428.0067733.
[26] 冯爱棋, 吴小俊, 徐天阳. 融合注意力机制和上下文信息的实时交通标志检测算法[J]. 计算机科学与探索, 2023, 17(11): 2676-2688.
FENG A Q, WU X J, XU T Y. Real-time traffic sign detection algorithm combining attention mechanism and con-textual information[J]. Journal of Frontiers of Computer Science and Technology , 2023, 17(11): 2676-2688.
[27] ZHENG Z H, WANG P, REN D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2021, 52(8): 8574-8586.
[28] ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.
[29] 甄然, 刘雨涵, 孟凡华, 等. 基于改进YOLO v7的低空飞行物目标检测方法[J/OL]. 无线电工程: 1-14[2023-09-03]. http://kns.cnki.net.ez.zust.edu.cn/kcms/detail/13.1097.TN.
20230828.1418.002.html.
ZHEN R, LIU Y H, MENG F H, et al. Low altitude flying target detection method based on improved YOLOv 7[J/OL]. Radio Engineering: 1-14[2023-09-03]. http: //kns.cnki.net.ez.zust.edu.cn/kcms/detail/13.1097.TN.20230828.1418.002.html.
[30] DU D, ZHU P, WEN L, et al. Visdrone-det2019: the vision meets drone object detection in image challenge results[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019.
[31] CHENG Y, ZHU J, JIANG M, et al. Flow: a dataset and benchmark for floating waste detection in inland waters[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10953-10962.
[32] TERVEN J, CORDOVA-ESPARZA D. A comprehensive review of YOLO: from YOLOv1 to YOLOv8 and beyond[J]. arXiv:2304.00501, 2023.
[33] LIANG S, WU H, ZHEN L, et al. Edge YOLO: real-time intelligent object detection system based on edge-cloud cooperation in autonomous vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 25345-25360.