计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (15): 107-114.DOI: 10.3778/j.issn.1002-8331.2303-0154

• 目标检测专题 • 上一篇    下一篇

复杂背景下的无人机图像小目标检测

王晓红,胡豫   

  1. 上海理工大学 出版印刷与艺术设计学院,上海 200125
  • 出版日期:2023-08-01 发布日期:2023-08-01

UAV Image Small Object Detection on Complex Background

WANG Xiaohong, HU Yu   

  1. College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai 200125, China
  • Online:2023-08-01 Published:2023-08-01

摘要: 针对无人机航拍图像背景复杂、目标特征小而导致检测精度低的问题,提出了一种基于YOLOv7-w6改进的小目标检测算法EMT-ECoTNet。采用具有全局建模优势的CoT模块和增加最大池化层MaxPool用以挖掘小目标更多纹理信息的MA-ECA通道注意力模块构建的ECoT Block,有利于小目标特征提取;通过具有大感受野的空间金字塔池化结构M-SPPFCSPC对小目标特征进一步增强;使用EIoU损失函数分别对预测框和真实框之间宽和高的预测结果进行惩罚来提高收敛速度和准确率。实验结果表明,EMT-ECoTNet在VisDrone数据集上mAP50达到62.8%,较原始基线模型YOLOv7-w6提高了3.2个百分点,比主流算法在无人机小目标检测任务上具有更好的检测性能。

关键词: 无人机图像, 复杂背景, 小目标检测, 注意力机制, 空间金字塔池化

Abstract: Algorithm for small object detection, called EMT-ECoTNet, has been proposed. It is based on the improved YOLOv7-w6 and aims to address the issue of low detection accuracy resulting from complex backgrounds and small object features in UAV images. The ECoT Block is used to construct the algorithm, which consists of CoT modules with global modeling advantages and MA-ECA channel attention modules. This block is beneficial for small object feature extraction by increasing the maximum pooling layer MaxPool to extract more texture information from small object. Additionally, the M-SPPFCSPC, which has a large receptive field, is used to further enhance the small object features. The EIoU loss function is used to penalize the predicted width and height between the predicted and ground truth boxes, which helps to improve the convergence speed and accuracy. The experimental results demonstrate that EMT-ECoTNet achieves an mAP50 of 62.8% on the VisDrone dataset, which is 3.2?percentage points higher than the original baseline model YOLOv7-w6. Furthermore, it has better detection performance than mainstream algorithms in UAV small object detection tasks.

Key words: UAV images, complex background, small object detection, attention mechanism, spatial pyramid pooling