Small Object Detection Algorithm Based on Improved YOLOv5 in UAV Image

doi:10.3778/j.issn.1002-8331.2212-0336

Abstract

Abstract: UAV aerial images have many characteristics, such as large-scale changes and complex backgrounds, so it is difficult for the existing detectors to detect small objects in aerial images. Aiming at the problem of mistake detection and omission, a small object detection algorithm model Drone-YOLO is proposed. A new detection branch is added to improve the detection capability at multiple scales, meanwhile the model contains a novel feature pyramid network with multi-level information aggregation, which realizes the fusion of cross-layers information. Then a feature fusion module based on multi-scale channel attention mechanism is designed to improve the focus on small objects. The classification task of the prediction head is decoupled from the regression task, and the loss function is optimized using Alpha-IoU to improve the accuracy of detection. The experimental results of VisDrone dataset show that the Drone-YOLO has improved the AP50 by 4.91?percentage points compared with the YOLOv5, and the inference time is only 16.78?ms. Compared with other mainstream models, it has a better detection effect for small targets, and can effectively complete the task of small target detection in UAV aerial images.

Key words: object detection, unmanned aerial vehicle（UAV）, small object, attention mechanism, feature fusion, YOLO

摘要： 无人机航拍影像具有目标尺度变化大、背景复杂等诸多特性，导致现有的检测器难以检测出航拍影像中的小目标。针对无人机影像中小目标误检漏检的问题，提出了改进YOLOv5的算法模型Drone-YOLO。增加了检测分支以提高模型在多尺度下的检测能力。设计了多层次信息聚合的特征金字塔网络结构，实现跨层次信息的融合。设计了基于多尺度通道注意力机制的特征融合模块，提高对小目标的关注度。将预测头的分类任务与回归任务解耦，使用Alpha-IoU优化损失函数定义，提升模型检测的效果。通过无人机影像数据集VisDrone的实验结果表明，Drone-YOLO模型较YOLOv5模型在AP50指标上提高了4.91个百分点，推理延时仅需16.78?ms。对比其他主流模型对于小目标拥有更好的检测效果，能够有效完成无人机航拍影像的小目标检测任务。

关键词: 目标检测, 无人机, 小目标, 注意力机制, 特征融合, YOLO

XIE Chunhui, WU Jinming, XU Huaiyu. Small Object Detection Algorithm Based on Improved YOLOv5 in UAV Image[J]. Computer Engineering and Applications, 2023, 59(9): 198-206.

谢椿辉, 吴金明, 徐怀宇. 改进YOLOv5的无人机影像小目标检测算法[J]. 计算机工程与应用, 2023, 59(9): 198-206.

References

[1] HE J，ERFANI S，MA X，et al.Alpha-IoU：a family of power intersection over union losses for bounding box regression[C]//Advances in Neural Information Processing Systems，2021：20230-20242.
[2] KISANTAL M，WOJNA Z，MURAWSKI J，et al.Augmentation for small object detection[J].arXiv：1902.07296，2019.
[3] CHEN C，ZHANG Y，LV Q，et al.RRNet：a hybrid detector for object detection in drone?captured images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.Los Alamitos：IEEE，2019：100?108.
[4] YU X，GONG Y，JIANG N，et al.Scale match for tiny person detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.Los Alamitos：IEEE，2020：1257?1265.
[5] CHEN Y，ZHANG P，LI Z，et al.Stitcher：feedback-driven data provider for object detection[J].arXiv：2004.12432，2020.
[6] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[7] DENG C，WANG M，LIU L，et al.Extended feature pyramid network for small object detection[J].IEEE Transactions on Multimedia，2021，24：1968-1979.
[8] 李青援，邓赵红，罗晓清，等.注意力与跨尺度融合的SSD目标检测算法[J].计算机科学与探索，2022，16（11）：2575-2586.
LI Q Y，DENG Z H，LUO X Q，et al.SSD object detection algorithm with attention and cross-scale fusion[J].Journal of Frontiers of Computer Science and Technology，2022，16（11）：2575-2586.
[9] 梁延禹，李金宝.多尺度非局部注意力网络的小目标检测算法[J].计算机科学与探索，2020，14（10）：1744-1753.
LIANG Y Y，LI J B.Small objects detection method based on multi-scale non-local attention network[J].Journal of Frontiers of Computer Science and Technology，2020，14（10）：1744-1753.
[10] ZHU X，LYU S，WANG X，et al.TPH-YOLOv5：improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2021：2778-2788.
[11] YANG C，HUANG Z，WANG N.QueryDet：cascaded sparse query for accelerating high-resolution small object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2022：13668-13677.
[12] JADERBERG M，SIMONYAN K，ZISSERMAN A.Spatial transformer networks[C]//Advances in Neural Information Processing Systems，2015.
[13] DAI J，QI H，XIONG Y，et al.Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：764-773.
[14] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[15] WANG Q，WU B，ZHU P，et al.ECA-Net：efficient channel attention for deep convolutional neural networks[J].arXiv：1910.03151，2019.
[16] HOU Q，ZHOU D，FENG J.Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：13713-13722.
[17] WOO S，PARK J，LEE J Y，et al.Cbam：convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：3-19.
[18] FU J，LIU J，TIAN H，et al.Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：3146-3154.
[19] HUANG Z，WANG X，HUANG L，et al.Ccnet：criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：603-612.
[20] GE Z，LIU S，WANG F，et al.Yolox：exceeding yolo series in 2021[J].arXiv：2107.08430，2021.
[21] LI Z，PENG C，YU G，et al.Light-head R-CNN：in defense of two-stage object detector[J].arXiv：1711. 07264，2017.
[22] CAI Z，VASCONCELOS N.Cascade R-CNN：delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：6154-6162.
[23] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.
[24] LAW H，DENG J.Cornernet：detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：734-750.
[25] MUHAMMAD M B，YEASIN M.Eigen-cam：class activation map using principal components[C]//2020 International Joint Conference on Neural Networks（IJCNN），2020：1-7.