Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (8): 173-181.DOI: 10.3778/j.issn.1002-8331.2305-0264

• Graphics and Image Processing • Previous Articles     Next Articles

Improved YOLOv7 for UAV Image Object Detection

ZOU Zhentao, LI Zeping   

  1. 1.State Key Laboratory of Public Big Data, Guiyang 550025, China
    2.School of Computer Science and Technology, Guizhou University, Guiyang 550025, China
  • Online:2024-04-15 Published:2024-04-15

改进YOLOv7的航拍图像目标检测

邹振涛,李泽平   

  1. 1.公共大数据国家重点实验室,贵阳 550025
    2.贵州大学 计算机科学与技术学院,贵阳 550025

Abstract: Aerial image target detection has significant practical implications for efficient interpretation of aerial images and applications in mapping, resource inventory, urban and rural planning, etc. To address challenges in UAV aerial images, such as varying object scales, background interference, and missing detection of small targets, propose an improved algorithm called AirYOLOv7, based on YOLOv7. Firstly, AirYOLOv7 combines a three-dimensional attention mechanism during feature extraction and a channel attention mechanism during feature fusion in the  original network. These mechanisms help the model focus on crucial information in the image. Secondly, because of the prevalence of small objects in aerial images, the algorithm adds an additional prediction head for detecting small objects. The algorithm also incorporates the C3STB before each prediction head to improve detection capability for objects of different scales. Additionally, the algorithm addresses the sensitivity of the IoU loss to positional deviations for small objects by introducing the Wasserstein distance into the original bounding box regression loss. This measure helps improve the detection capability for small objects. Experimental results demonstrate that the effectiveness of  AirYOLOv7 on two publicly available optical aerial datasets, DOTA and VisDrone achieves mean average precision of 78.65% and 51.79% on these datasets, respectively, showing improvements of 1.92 percentage points and 2.28 percentage points comparing to the original YOLOv7 which validates the effectiveness of the proposed improvements on optical aerial images.

Key words: object detection, UAV images, attention mechanism, loss function, Swin Transformer, YOLOv7

摘要: 航拍图像目标检测对于高效解译航拍图像,并用于地图绘制、资源普查、城乡规划等领域有着重大现实意义。针对无人机航拍图像中的物体尺度变化大、易受到背景干扰和微小目标容易错检漏检的问题,提出一种基于YOLOv7进行改进的航拍图像目标检测算法(AirYOLOv7)。AirYOLOv7通过在原网络的特征提取阶段结合三维注意力机制,在特征融合阶段结合通道注意力机制,以帮助模型更好地聚焦于图像中的关键信息。考虑到航拍图像中存在许多微小物体,算法额外增加了一个用于检测微小物体的预测头,并在每个预测头前引入C3STB,以增强算法对不同尺度目标的检测能力。针对IoU损失对微小物体的位置偏差非常敏感,通过在原边框回归损失中引入Wasserstein距离来衡量微小物体之间的差异,以提高算法对微小物体的检测能力。实验结果表明,AirYOLOv7在DOTA和VisDrone这两个公开的光学航拍数据集上的mAP分别达到78.65%和51.79%,相较于原始的YOLOv7分别提高了1.92个百分点和2.28个百分点,证明了改进方法在光学航拍图像上的有效性。

关键词: 目标检测, 航拍图像, 注意力机制, 损失函数, Swin Transformer, YOLOv7