Lightweight YOLOv8 Detection Algorithm for Small Object Detection in UAV Aerial Photography

doi:10.3778/j.issn.1002-8331.2402-0230

Abstract

Abstract: To address the problems of difficult feature extraction and small targets being overwhelmed by noise in complex scenes for target detection in unmanned aerial vehicle (UAV) images, this paper proposes an UAV target detection algorithm called SC-YOLO based on YOLOv8s. Firstly, to learn positional details of regions of interest, a self-position module (SPM) attention based on coordinate attention (CA) is presented. Secondly, to mitigate the impact caused by channel compression of the Carafe upsampling operator, a Carafe enhancer module (CEM) is proposed. Finally, by analyzing the relationship between the gradient gain function and the size of targets in the dataset, this paper enables WIoU_v3 to focus more on the general quality anchor boxes for medium and small targets. This is validated on the VisDrone2019 dataset, where it is found that WIoU_v3 can better target the parameter setting range for general quality anchor boxes of medium and small targets. The improved YOLOv8s algorithm achieves a mean average precision (mAP) of 43.1% on the VisDrone2019 validation set and an mAP of 34.8% on the test set, demonstrating superior detection performance among algorithms of similar scale in recent years. The improved algorithm only adds 1.1×106 in terms of the number of parameters and increases the floating point operations (FLOPs) by 1.5 GFLOPs, yet it achieves a 2.0 and 2.1 percentage points increase in detection accuracy on the validation and test sets, respectively. On the Tinyperson dataset, the detection accuracy is increased by 1.4 percentage points.

Key words: YOLOv8, Carafe, SGE attention mechanism, coordinate attention mechanism, WIoU

摘要： 针对在无人机图像目标检测中复杂场景下目标特征难提取且小目标容易被淹没在噪声中的问题，提出一种基于YOLOv8s的无人机目标检测算法SC-YOLO。为了能够学习到感兴趣区域的位置细节，基于CA（coordinate attention）提出了SPM（self-position module）注意力。为了缓解Carafe上采样算子因为通道压缩所带来的影响，提出了CEM（Carafe enhancer module）。通过分析梯度增益函数与数据集中目标大小的关系，使WIoU_v3能够更加关注中、小目标的普通质量锚框，并且在VisDrone2019数据集上进行验证，得到WIoU_v3能够更加关注中、小目标的普通质量锚框的参数设置范围。改进后的YOLOv8s算法在VisDrone2019验证集上的平均均值精度（mAP）提高到43.1%，在测试集上的mAP提高到34.8%，在近几年同等规模的算法中拥有较好的检测性能；改进算法相较基准算法参数量仅增加1.1×106，浮点运算次数（FLOPs）增加1.5 GFLOPs，但在验证集以及测试集上检测精度分别提升了2.0和2.1个百分点；在Tinyperson数据集上的检测精度提高了1.4个百分点。

关键词: YOLOv8, Carafe, SGE注意力机制, 坐标注意力机制, WIoU

LI Yanchao, SHI Weiya, FENG Can. Lightweight YOLOv8 Detection Algorithm for Small Object Detection in UAV Aerial Photography[J]. Computer Engineering and Applications, 2024, 60(17): 167-178.

李岩超, 史卫亚, 冯灿. 面向无人机航拍小目标检测的轻量级YOLOv8检测算法[J]. 计算机工程与应用, 2024, 60(17): 167-178.

References

[1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[2] GIRSHICK R. Fast R- CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[3] REN S, HE K, GIRSHICK R, et al. Faster R- CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[4] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[5] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[6] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[7] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[8] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision, 2016: 21-37.
[9] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[10] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of European Conference on Computer Vision, 2014: 740-755.
[11] LIANG Z, SHAO J, ZHANG D, et al. Small object detection using deep feature pyramid networks[C]//Proceedings of Pacific Rim Conference on Multimedia, 2018: 554-564.
[12] 何湘杰, 宋晓宁. YOLOv4-Tiny的改进轻量级目标检测算法[J]. 计算机科学与探索, 2024, 18(1): 138-150.
HE X J, SONG X N. Improved YOLOv4-tiny lightweight target detection algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 138-150.
[13] AMUDHAN A N, SUDHEER A P. Lightweight and computationally faster hypermetropic Convolutional neural network for small size object detection[J]. Image and Vision Computing, 2022, 119: 104396.
[14] FU K, LI J, MA L, et al. Intrinsic relationship reasoning for small object detection[EB/OL]. (2020-09-02)[2024-01-28]. https://arxiv.org/abs/2009.00833.
[15] 王春梅, 刘欢. YOLOv8-VSC: 一种轻量级的带钢表面缺陷检测算法[J]. 计算机科学与探索, 2024, 18(1): 151-160.
WANG C M, LIU H. YOLOv8-VSC: lightweight algorithm for strip surface defect detection[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 151-160.
[16] ZHOU J, ZHANG B, YUAN X, et al. YOLO-CIR: the network based on YOLO and ConvNeXt for infrared object detection[J]. Infrared Physics & Technology, 2023, 131: 104703.
[17] LENG J, MO M, ZHOU Y, et al. Pareto refocusing for drone-view object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 33(3): 1320-1334.
[18] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[19] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[20] 卞鹏程, 郑忠龙, 李明禄, 等. 基于注意力融合网络的视频超分辨率重建[J]. 计算机应用, 2021, 41(4): 1012-1019.
BIAN P C, ZHENG Z L, LI M L, et al. Attention fusion network based video super-resolution reconstruction[J]. Journal of Computer Applications, 2021, 41(4): 1012-1019.
[21] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13713-13722.
[22] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[23] WANG J, CHEN K, XU R, et al. Carafe: content-aware reassembly of features[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2019: 3007-3016.
[24] LI X, HU X, YANG J. Spatial group-wise enhance: improving semantic feature learning in Convolutional networks[J]. arXiv:1905.09646, 2019.
[25] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12993-13000.
[26] ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.
[27] TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[J]. arXiv:2301.10051, 2023.
[28] GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression[J]. arXiv:2205.12740, 2022.
[29] SILIANG M, YONG X. MPDIoU: a loss for efficient and accurate bounding box regression[J]. arXiv:2307.07662, 2023.
[30] DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019: 213-226.
[31] JOCHER G, STOKEN A, BOROVEC J, et al. ultralytics/yolov5: v5[EB/OL]. (2022-11-22)[2024-01-28]. https://github.com/ultralytics/yolov5.
[32] XU S, WANG X, LV W, et al. PP-YOLOE: an evolved version of YOLO[J]. arXiv:2203.16250, 2022.
[33] ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2778-2788.
[34] TANG W, SUN J, WANG G. Horizontal feature pyramid network for object detection in UAV images[C]//Proceedings of the 2021 China Automation Congress, 2021: 7746-7750.
[35] LIU S, ZHA J, SUN J, et al. EdgeYOLO: an edge-real-time object detector[J]. arXiv:2302.07483, 2023.
[36] RUKHOVICH D, SOFIIUK K, GALEEV D, et al. IterDet: iterative scheme for object detection in crowded environments[C]//Proceedings of the Structural, Syntactic, and Statistical Pattern Recognition, 2021: 344-354.
[37] ZHU X, SU W, LU L, et al. Deformable DETR: deformable transformers for end-to-end object detection[J]. arXiv:2010.04159, 2020.