Improved YOLOv7 for UAV Image Object Detection

doi:10.3778/j.issn.1002-8331.2305-0264

Abstract

Abstract: Aerial image target detection has significant practical implications for efficient interpretation of aerial images and applications in mapping, resource inventory, urban and rural planning, etc. To address challenges in UAV aerial images, such as varying object scales, background interference, and missing detection of small targets, propose an improved algorithm called AirYOLOv7, based on YOLOv7. Firstly, AirYOLOv7 combines a three-dimensional attention mechanism during feature extraction and a channel attention mechanism during feature fusion in the original network. These mechanisms help the model focus on crucial information in the image. Secondly, because of the prevalence of small objects in aerial images, the algorithm adds an additional prediction head for detecting small objects. The algorithm also incorporates the C3STB before each prediction head to improve detection capability for objects of different scales. Additionally, the algorithm addresses the sensitivity of the IoU loss to positional deviations for small objects by introducing the Wasserstein distance into the original bounding box regression loss. This measure helps improve the detection capability for small objects. Experimental results demonstrate that the effectiveness of AirYOLOv7 on two publicly available optical aerial datasets, DOTA and VisDrone achieves mean average precision of 78.65% and 51.79% on these datasets, respectively, showing improvements of 1.92 percentage points and 2.28 percentage points comparing to the original YOLOv7 which validates the effectiveness of the proposed improvements on optical aerial images.

Key words: object detection, UAV images, attention mechanism, loss function, Swin Transformer, YOLOv7

摘要： 航拍图像目标检测对于高效解译航拍图像，并用于地图绘制、资源普查、城乡规划等领域有着重大现实意义。针对无人机航拍图像中的物体尺度变化大、易受到背景干扰和微小目标容易错检漏检的问题，提出一种基于YOLOv7进行改进的航拍图像目标检测算法（AirYOLOv7）。AirYOLOv7通过在原网络的特征提取阶段结合三维注意力机制，在特征融合阶段结合通道注意力机制，以帮助模型更好地聚焦于图像中的关键信息。考虑到航拍图像中存在许多微小物体，算法额外增加了一个用于检测微小物体的预测头，并在每个预测头前引入C3STB，以增强算法对不同尺度目标的检测能力。针对IoU损失对微小物体的位置偏差非常敏感，通过在原边框回归损失中引入Wasserstein距离来衡量微小物体之间的差异，以提高算法对微小物体的检测能力。实验结果表明，AirYOLOv7在DOTA和VisDrone这两个公开的光学航拍数据集上的mAP分别达到78.65%和51.79%，相较于原始的YOLOv7分别提高了1.92个百分点和2.28个百分点，证明了改进方法在光学航拍图像上的有效性。

关键词: 目标检测, 航拍图像, 注意力机制, 损失函数, Swin Transformer, YOLOv7

ZOU Zhentao, LI Zeping. Improved YOLOv7 for UAV Image Object Detection[J]. Computer Engineering and Applications, 2024, 60(8): 173-181.

邹振涛, 李泽平. 改进YOLOv7的航拍图像目标检测[J]. 计算机工程与应用, 2024, 60(8): 173-181.

References

[1] MITTAL P, SINGH R, SHARMA A. Deep learning-based object de-tection in low-altitude UAV datasets: a survey[J]. Image and Vision Computing, 2020, 104: 104046.
[2] CHENG G, HAN J. A survey on object detection in optical remote sensing images[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 117: 11-28.
[3] 董刚, 谢维成, 黄小龙, 等.深度学习小目标检测算法综述[J].计算机工程与应用，2023, 59(11): 16-27.
DONG G, XIE W C, HUANG X L, et al. Review of small object detection algorithms based on deep learning[J]. Computer Engineering and Applications, 2023, 59(11): 16-27.
[4] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015.
[5] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real time object detection with region proposal networks[C]// Advances in Neural Information Processing Systems, 2015.
[6] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision, 2016.
[7] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017.
[8] 胡皓, 郭放, 刘钊.改进YOLOX-S模型的施工场景目标检测[J].计算机科学与探索, 2023, 17(5): 1089-1101.
HU H, GUO F, LIU Z. Object detection based on improved YOLOX-S model in construction sites[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1089-1101.
[9] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv: 2207.02696, 2022.
[10] 赵振兵, 王帆帆, 刘良帅, 等.基于注意力特征融合YOLOv5模型的无人机输电线路航拍图像金具检测方法[J].电测与仪表, 2023, 60(3): 145-152.
ZHAO Z B, WANG F F, LIU L S, et al. Hardware detection method of aerial image of UVA transmission line based on attention feature fusion YOLOv5 model[J].Electrical Measurement & Instrumentation, 2023, 60(3): 145-152.
[11] 苏俊楷, 段先华, 叶赵兵.改进YOLOv5算法的玉米病害检测研究[J].计算机科学与探索, 2023, 17(4): 933-941.
SU J K, DUAN X H, YE Z B. Research on corn disease detection based on improved YOLOv5 algorithm[J].Journal of Frontiers of Computer Science and Technology, 2023, 17(4): 933-941.
[12] 冒国韬, 邓天民, 于楠晶.基于多尺度分割注意力的无人机航拍图像目标检测算法[J].航空学报, 2023, 44(5): 273-283.
MAO G T, DENG T M, YU N J. Object detection in UAV images based on multi-scale split attention[J].Acta Aeronauticaet Astronautica Sinica, 2023, 44(5): 273-283.
[13] ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone?captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
[14] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision, 2018.
[15] LU X, JI J, XING Z, et al. Attention and feature fusion SSD for remote sensing object detection[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 1-9.
[16] 李坤亚, 欧鸥, 刘广滨, 等.改进YOLOv5的遥感图像目标检测算法[J].计算机工程与应用, 2023, 59(9): 207-214.
LI K Y, OU O, LIU G B, et al. Target detection algorithm of remote sensing image based on improved YOLOv5[J]. Computer Engineering and Applications, 2023, 59(9): 207-214.
[17] XU X, FENG Z, CAO C, et al. An improved swin transformer-based model for remote sensing object detection and instance segmentation[J]. Remote Sensing, 2021, 13(23): 4779.
[18] YANG L, ZHANG R Y, LI L, et al. SimAM: a simple, parameter?free attention module for convolutional neural networks[C]//Proceedings of the International Conference on Machine Learning, 2021: 11863-11874.
[19] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
[20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[21] LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
[22] ZHANG J, XIA K, HUANG Z, et al. ETAM: ensemble transformer with attention modules for detection of small objects[J]. Expert Systems with Applications, 2023, 224: 119997.
[23] YANG X, ZHANG G, YANG X, et al. Detecting rotated objects as Gaussian distributions and its 3D generalization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(4): 4335-4354.
[24] WANG J, XU C, YANG W, et al. A normalized Gaussian Wasserstein distance for tiny object detection[J]. arXiv: 2110.13389, 2021.
[25] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J].arXiv:2010.11929, 2020.
[26] XIA G S, BAI X, DING J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[27] DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision workshops, 2019.
[28] JIANG B, LUO R, MAO J, et al. Acquisition of localization confidence for accurate object detection[C]//Proceedings of the European Conference on Computer Vision, 2018.
[29] FANG X L, HU F, YANG M, et al. Small object detection in remote sensing images based on super-resolution[J]. Pattern Recognition Letters, 2022, 153: 107-112.
[30] WANG?G Q, ZHUANG Y, CHEN H, et al. FSoD-Net: fullscale object detection from optical remote sensing imagery[J].IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-18.
[31] WANG D, ZHANG J, DU B, et al. An empirical study of remote sensing pretraining[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 61: 1-20.
[32] ZHANG S, CHI C, YAO Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
[33] LI C, YANG T, ZHU S, et al. Density map guided object detection in aerial images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.
[34] DENG S, LI S, XIE K, et al. A global-local self-adaptive network for drone-view object detection[J]. IEEE Transactions on Image Processing, 2020, 30: 1556-1569.
[35] GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey[J]. Computational Visual Media, 2022, 8(3): 331-368.