Improved YOLOv5s Small Object Detection Algorithm in UAV View

doi:10.3778/j.issn.1002-8331.2307-0223

Abstract

Abstract: Aiming at the problems such as the long distance between UAV and object in flight, the obvious difference in the size of the photographed object and the existence of object occlusion, an improved algorithm BD-YOLO based on YOLOv5s for small object detection under UAV perspective is proposed. In the feature fusion network, bi-level routing attention (BRA) is used to filter the least relevant features in the feature map in a dynamic sparse way, and retain some important regional features, so as to improve the feature extraction ability of the model. Since the feature map will lose a lot of location and feature information after multiple subsampled, a dynamic object detection head (DyHead) combining attention mechanism is adopted. The DyHead integrates scale perception, space perception and task perception to achieve stronger feature representation capability. Focal-EIoU Loss function is used to solve the problem of inaccurate regression results of CIoU Loss calculation in YOLOv5s, so as to improve the detection accuracy of the model for small object. The experimental results show that on the VisDrone2019-DET dataset, the BD-YOLO model has increased the mean average precision (mAP) index by 0.062 compared with the YOLOv5s model, and has better results for small object detection than other mainstream models.

Key words: unmanned aerial vehicle perspective, YOLOv5s, small object, attention mechanism, loss function

摘要： 针对无人机飞行时与目标距离较远，被拍摄的目标大小有明显的差异且存在被物体遮挡等问题，提出一种基于YOLOv5s的无人机视角下小目标检测改进算法BD-YOLO。在特征融合网络中采用双层路由注意力（bi-level routing attention，BRA），其以动态稀疏的方式过滤特征图中最不相关的特征，保留部分重要区域特征，从而提高模型特征提取的能力；由于特征图经过多次下采样后会丢失大量位置信息和特征信息，因此采用一种结合注意力机制的动态目标检测头DyHead（dynamic head），该检测头通过尺度感知、空间感知和任务感知的三者统一，以实现更强的特征表达能力；使用Focal-EIoU损失函数，来解决YOLOv5s中CIoU Loss计算回归结果不准确的问题，从而提高模型对小型目标的检测精度。实验结果表明，在VisDrone2019-DET数据集上，BD-YOLO模型较YOLOv5s模型在平均精度（mAP@0.5）指标上提高了0.062，对比其他主流模型对于小目标的检测都有更好的效果。

关键词: 无人机视角, YOLOv5s, 小目标, 注意力机制, 损失函数

WU Mingjie, YUN Lijun, CHEN Zaiqing, ZHONG Tianze. Improved YOLOv5s Small Object Detection Algorithm in UAV View[J]. Computer Engineering and Applications, 2024, 60(2): 191-199.

吴明杰, 云利军, 陈载清, 钟天泽. 改进YOLOv5s的无人机视角下小目标检测算法[J]. 计算机工程与应用, 2024, 60(2): 191-199.

References

[1] 韩玉洁, 曹杰, 刘琨, 等. 基于改进YOLO的无人机对地多目标检测[J]. 电子测量技术, 2020, 43(21): 19-24.
HAN Y J, CAO J, LIU K, et al. UAV ground multi-target detection based on improved YOLO[J]. Electronic Measurement Technology, 2020, 43(21): 19-24.
[2] 丁田, 陈向阳, 周强, 等. 基于改进YOLOX的安全帽佩戴实时检测[J]. 电子测量技术, 2022, 45(17): 72-78.
DING T, CHEN X Y, ZHOU Q, et al. Real-time detection of helmet waring based on improved YOLOX[J]. Electronic Measurement Technology, 2022, 45(17): 72-78.
[3] 冒国韬, 邓天民, 于楠晶. 基于多尺度分割注意力的无人机航拍图像目标检测算法[J]. 航空学报, 2023, 44(5): 273-283.
MAO G T, DENG T M, YU N J. Object detection in UAV images based on multi-scale split attention[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(5): 273-283.
[4] ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOV5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2778-2788.
[5] YANG Y Z. Drone-view object detection based on the improved YOLOv5[C]//Proceedings of the IEEE International Conference on Electrical Engineering, Big Data and Algorithms, Changchun, 2022: 612-617.
[6] ZHU L, WANG X, KE Z, et al. BiFormer: vision transformer with bi-level routing attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10323-10333.
[7] DAI X, CHEN Y, XIAO B, et al. Dynamic head: unifying object detection heads with attentions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7373-7382.
[8] ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.
[9] 李红光, 于若男, 丁文锐. 基于深度学习的小目标检测研究进展[J]. 航空学报, 2021, 42(7): 107-125.
LI H G, YU R N, DING W R. Research development of small object traching based on deep learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(7): 107-125.
[10] CHEN Y, ZHANG P, LI Z, et al. Stitcher: feedback-driven data provider for object detection[J]. arXiv:2004.12432,2020.
[11] KISANTAL M, WOJNA Z, MURAWSKI J, et al. Augmentation for small object detection[J]. arXiv:1902.07296, 2019.
[12] 李青援, 邓赵红, 罗晓清, 等. 注意力与跨尺度融合的 SSD目标检测算法[J]. 计算机科学与探索, 2022, 16(11): 2575-2586.
LI Q Y, DENG Z H, LUO X Q, et al. SSD object detection algorithm with attention and cross-scale fusion[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(11): 2575-2586.
[13] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region prposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[14] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detetion[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 779-788.
[15] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767,2018.
[16] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934,2020.
[17] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//European Conference on Computer Vision (ECCV), Amsterdam, 2016: 21-37.
[18] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, 2020: 1571-1580.
[19] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 936-944.
[20] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 8759-8768.
[21] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[22] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13713-13722.
[23] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11534-11542.
[24] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.
[25] 俞军, 贾银山. 改进YOLOv5的小目标检测算法[J]. 计算机工程与应用, 2023, 59(12): 201-207.
YU J, JIA Y S. Improved YOLOv5 for small object detection algorithm[J]. Computer Engineering and Applications, 2023, 59(12): 201-207.
[26] DU D W, ZHU P F, WEN L Y, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop, Seoul, 2019.
[27] REN S, ZHOU D, HE S, et al. Shunted self-attention via multi-scale token aggregation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 10853-10862.
[28] OUYANG D, HE S, ZHANG G, et al. Efficient multi-scale attention module with cross-spatial learning[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023: 1-5.
[29] 刘展威, 陈慈发, 董方敏. 基于YOLOv5s的航拍小目标检测改进算法研究[J]. 无线电工程, 2023, 53(10): 2286-2294.
LIU Z W, CHEN C F, DONG F M. Improved aerial small object detection algorithm based on YOLOv5s[J]. Radio Engineering, 2023, 53(10): 2286-2294.