AEM-YOLOv8s：Small Target Detection Algorithm for UAV Aerial Images

doi:10.3778/j.issn.1002-8331.2403-0256

Abstract

Abstract: The AEM-YOLOv8s algorithm is proposed to address issues of low performance, missed detections, occlusions, and high model parameter count in small object detection in current UAV aerial imagery. Within the C2f module, the advantages of AKConv (alterable kernel convolution) and EMA (efficient multi-scale attention) are combined to design the C2f-BE module, which enhances the algorithm’s ability to process features while reducing the model parameter count. By introducing a small object detection layer and BiFPN structure, through cross-scale connections and weighted feature fusion, more shallow features are retained, reducing algorithm parameters. The design of a multi-scale feature fusion branch merges shallow features containing more small object information with deeper semantic features, reducing missed detections under occlusion and improving small object detection performance. Experimental results on the VisDrone2019 public dataset demonstrate that the AEM-YOLOv8s algorithm achieves an mAP50 of 50.1% and mAP50：95 of 31.1%, representing respective improvements of 10.8 and 7.6 percentage points over YOLOv8s, while also reducing parameters by 32.2% compared to YOLOv8s.

Key words: YOLOv8s, C2f-BE module, small object, multi-scale

摘要： 针对目前无人机航拍图中的小目标检测性能低、漏检、遮挡以及模型参数量大的问题，提出了AEM-YOLOv8s算法。在C2f模块中结合AKConv（alterable kernel convolution）和EMA（efficient multi-scale attention）的优点，设计了C2f-BE模块，更好地提高了算法处理特征的能力，同时也降低了模型参数量。引入小目标检测层和BiFPN结构，通过跨尺度连接方式和加权特征融合，能够保留更多的浅层特征，并且减少了算法参数量。设计多尺度特征融合分支，将浅层特征与深层特征进行融合，减少了遮挡情况下的漏检，提高了算法对小目标检测性能。在VisDrone2019公开数据集上的实验表明，AEM-YOLOv8s算法的mAP50为50.1%，mAP50：95为31.1%，较YOLOv8s分别提高了10.8和7.6个百分点，同时参数量较YOLOv8s降低了32.2%。

关键词: YOLOv8s, C2f-BE模块, 小目标, 多尺度

JIANG Wei, WANG Wanhu, YANG Junjie. AEM-YOLOv8s：Small Target Detection Algorithm for UAV Aerial Images[J]. Computer Engineering and Applications, 2024, 60(17): 191-202.

蒋伟, 王万虎, 杨俊杰. AEM-YOLOv8s：无人机航拍图像的小目标检测[J]. 计算机工程与应用, 2024, 60(17): 191-202.

References

[1] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the European Conference on Computer Vision, 2016: 21-37.
[2] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detetion[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[3] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[4] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[5] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[6] GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[J]. arXiv:1606.08415, 2016.
[7] LI C Y, LI L L, JIANG H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[8] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[9] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[10] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[11] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936-944.
[12] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[13] QIAO S Y, CHEN L C, YUILLE A. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10208-10219.
[14] QI C, GAO J F, PEARSON S, et al. Tea chrysanthemum detection under unstructured environments using the TC-YOLO model[J]. Expert Systems with Applications, 2022, 193: 116473.
[15] 何湘杰, 宋晓宁. YOLOv4-Tiny的改进轻量级目标检测算法[J]. 计算机科学与探索, 2024, 18(1): 138-150.
HE X J, SONG X N. Improved YOLOv4-Tiny lightweight target detection algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 138-150.
[16] 张华卫, 张文飞, 蒋占军, 等. 引入上下文信息和Attention Gate的GUS-YOLO遥感目标检测算法[J]. 计算机科学与探索, 2024, 18(2): 453-464.
ZHANG W H, ZHANG W F, JIANG Z J, et al. GUS-YOLO remote sensing target detection algorithm introducing context information and Attention Gate[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(2): 453-464.
[17] YANG X, YANG J R, YAN J C, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 8232-8241.
[18] CHEN C R, ZHANG Y, LV Q X, et al. RRNet: a hybrid detector for object detection in drone-captured images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2019: 100-108.
[19] LIN Q Z, DING Y, XU H, et al. ECascade-RCNN: enhanced cascade RCNN for multi-scale object detection in UAV images[C]//Proceedings of the International Conference on Automation, Robotics and Applications, 2021: 268-272.
[20] TANG W Q, SUN J, WANG G. Horizontal feature pyramid network for object detection in UAV images[C]//Proceedings of the China Automation Congress, 2021: 7746-7750.
[21] LIU X, ZHANG Z Y. A vision-based target detection, tracking, and positioning algorithm for unmanned aerial vehicle[J]. Wireless Communications and Mobile Computing, 2021, 2021(1): 1-12.
[22] LI C L, YANG T J N, ZHU S J, et al. Density map guided object detection in aerial images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 737-746.
[23] DUAN C Z, WEI Z W, ZHANG C, et al. Coarse-grained density map guided object detection in aerial images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021: 2789-2798.
[24] LIU Z M, GAO G Y, SUN L, et al. HRDNet: high-resolution detection network for small objects[C]//Proceedings of the IEEE International Conference on Multimedia and Expo, 2021: 1-6.
[25] LI W T, CHEN Y J, HU K X, et al. Oriented RepPoints for aerial object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 1819-1828.
[26] ZHANG X, SONG Y Z, SONG T T, et al. AKConv: convolutional kernel with arbitrary sampled shapes and arbitrary number of parameters[J]. arXiv:2311.11587, 2023.
[27] OUYANG D L, HE S, ZHANG G Z, et al. Efficient MultiScale attention module with cross-spatial learning[C]//Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, 2023: 1-5.
[28] TAN M X, PANG R M, V. LE Q. EfficientDet: scalableand efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10778-10787.
[29] DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 764-773.
[30] DU D W, ZHU P F, WEN L Y, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2019: 213-22.
[31] WEI W, CHENG, Y, HE J F, et al. A review of small object detection based on deep learning[J]. Neural Computing and Applications, 2024, 36(12): 6283-6303.
[32] YANG C H Y, HUANG Z H, WANG N Y. QueryDet: cascaded sparse query for accelerating high-resolution small object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 13658-13667.
[33] CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[34] ZHU X K, LU S C, WANG X. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021: 2778-2788.
[35] WANG G, CHEN Y F, PEI A, et al. UAV-YOLOv8: a small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios[J]. ?Sensors,?2023, ?23(16): 7190.
[36] 付锦燚, 张自嘉, 孙伟, 等. 改进YOLOv8的航拍图像小目标检测算法[J]. 计算机工程与应用, 2024, 60(6): 100-109.
FU J Y, ZHANG Z J, SUN W, et al. Improved YOLOv8 small target detection algorithm in aerial images[J]. Computer Engineering and Applications, 2024, 60(6): 100-109.
[37] DUAN K W, BAI S, XIE L X, et al. CenterNet: keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 6568-6577.
[38] 吴明杰, 云利军, 陈载清, 等. 改进YOLOv5s的无人机视角下小目标检测算法[J]. 计算机工程与应用, 2024, 60(2): 191-199.
WU M J, YUN L J, CHEN Z Q, et al. Improved YOLOv5s small object detection algorithm in UAV view[J]. Computer Engineering and Applications, 2024, 60(2): 191-199.