计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (20): 198-206.DOI: 10.3778/j.issn.1002-8331.2404-0114

• 图形图像处理 • 上一篇    下一篇

面向无人机小目标的RTDETR改进检测算法

胡佳乐,周敏,申飞   

  1. 武汉科技大学 机械自动化学院,武汉 430080
  • 出版日期:2024-10-15 发布日期:2024-10-15

Improved Detection Algorithm of RTDETR for UAV Small Target

HU Jiale, ZHOU Min, SHEN Fei   

  1. College of Mechanical Automation, Wuhan University of Science and Technology, Wuhan 430080, China
  • Online:2024-10-15 Published:2024-10-15

摘要: 针对无人机目标检测中目标小且密集、背景复杂、硬件条件限制等挑战,提出一种改进的RTDETR检测器。在骨干网络,设计轻量级多尺度注意力特征提取模块(Rep-FasterNet EMA block),使用RepConv改进FasterNet block,同时引入多尺度注意力模块(EMA),增强空间特征提取能力并降低计算冗余。在Encoder部分,采用了DyASF特征融合结构替换CCFM,利用动态尺度序列特征融合(DySSFF)模块和三重特征编码器(TPE)模块,避免上下采样导致的小目标特征信息丢失,并丰富小目标检测详细信息,增强了网络尺度特征融合能力。对于损失函数,结合Focaler-IoU和Shape-IoU的优点,提出了Focaler-Shape-IoU替换原模型GIOU,注入边界框的形状和尺度信息,聚焦困难样本,增强边界框回归效果。实验结果表明,改进模型在Visdrone2019数据集上的mAP0.5和mAP0.5:0.95分别提升了1.6个百分点和0.7个百分点,同时权值文件大小有一定减少,验证了改进模型的有效性。

关键词: 无人机遥感, 小目标检测, RTDETR, 多尺度注意力

Abstract: An improved RTDETR detector is proposed to solve the challenges in UAV target detection, such as small and dense targets, complex background and limited hardware conditions. In the backbone network, lightweight multi-scale attention feature extraction module (Rep-FasterNet EMA block) is designed, and RepConv is used to improve the FasterNet block. Meanwhile, multi-scale attention module (EMA) is introduced to enhance the spatial feature extraction capability and reduce the computational redundancy. In the Encoder part, DyASF feature fusion structure is used to replace CCFM, and dynamic scale sequence feature fusion (DySSFF) module and triple feature encoder (TPE) module are used to avoid the loss of small target feature information caused by up and down sampling, enrich the detailed information of small target detection, and enhance the network scale feature fusion capability. Finally, for the loss function, combining the advantages of Focaler-IoU and shape-IoU, Focaler-Shape-IoU is proposed to replace the original model GIOU, inject the shape and scale information of the bounding box, focus the difficult samples, and enhance the bounding box regression effect. The experimental results show that the mAP0.5 and mAP0.5:0.95 of the improved model on the Visdrone2019 dataset are improved by 1.6 percentage points and 0.7 percentage points respectively, while the weight file size has been reduced to a certain extent, which verifies the effectiveness of the improved model.

Key words: UAV remote sensing, small target detection, real-time detection Transformer (RTDETR), multiscale attention