计算机工程与应用 ›› 2026, Vol. 62 ›› Issue (8): 93-104.DOI: 10.3778/j.issn.1002-8331.2507-0309

• 目标检测专题 • 上一篇    下一篇

改进RT-DETR的无人机图像小目标检测算法研究

郭杰,胡建龙+,张俊超,王宇翔,张心译   

  1. 西北大学 网络和数据中心,西安 710127
    + 通信作者 E-mail:hujl8735@163.com
  • 收稿日期:2025-07-23 修回日期:2025-10-09 在线发布日期:2026-04-15 出版日期:2026-04-15

Improved RT-DETR Algorithm for Small Object Detection in UAV Images

GUO Jie, HU Jianlong+, ZHANG Junchao, WANG Yuxiang, ZHANG Xinyi   

  1. College of Networking and Data Center, Northwest University, Xi'an 710127, China
    + Corresponding author E-mail:hujl8735@163.com
  • Received:2025-07-23 Revised:2025-10-09 Online:2026-04-15 Published:2026-04-15

摘要: 针对无人机航拍图像中目标尺寸小、特征模糊且易被遮挡等问题,提出一种改进RT-DETR的无人机图像小目标检测算法,以提升其在复杂场景下的检测精度与鲁棒性。在主干网络中设计特征增强模块(C2f synergistic multi-attention Transformer,C2f_SMT),通过结合SMA(synergistic multi-attention)机制与Transformer结构,有效融合浅层细节特征与高层语义信息,增强了模型对小目标的特征表达能力。为优化特征交互效率,提出双注意力特征交互机制(dual-attention feature interaction,DAFI),该机制采用并行化轻量结构设计,在降低模型复杂度的同时,保留了全局特征建模能力。针对多尺度融合过程中小目标特征易被忽略的问题,引入多尺度特征融合模块(cross-OmniKernel and small-target preservation feature fusion module,COSPFM),以改进特征融合方式,强化了小目标的多尺度感知能力。实验结果表明,改进模型在VisDrone2019数据集上取得了mAP@0.5为51.0%、mAP@0.5:0.95为31.5%的检测性能,分别较基准模型RT-DETR提升了3.1和2.2个百分点。同时,为验证模型的泛化能力,将改进算法迁移至UAVDT数据集进行测试,该模型在各类交通目标(如car、truck、bus)上的检测精度均取得明显提升,平均精度均值mAP@0.5达34.4%,进一步验证了所提方法在多场景小目标检测任务中的有效性与泛化能力。

关键词: 小目标检测, 无人机图像, RT-DETR, 特征增强, 特征交互, 多尺度特征融合

Abstract: To address challenges in UAV aerial imagery, such as small object size, blurred features, and frequent occlusions, an improved real-time detection Transformer (RT-DETR) algorithm is proposed to enhance detection accuracy and robustness in complex scenes. A feature enhancement module, C2f synergistic multi-attention Transformer (C2f_SMT), is designed in the backbone to integrate shallow details and high-level semantics by combining the synergistic multi-attention (SMA) mechanism with Transformer architecture, thereby improving feature representation for small objects. To optimize feature interaction, a dual-attention feature interaction module (DAFI) is introduced, employing a parallel lightweight structure that reduces model complexity while preserving global modeling capacity. To address the issue of small-object information loss during multi-scale fusion, a cross-OmniKernel and small-target preservation feature fusion module (COSPFM) is proposed to enhance multi-scale perception of small targets. Experimental results on the VisDrone2019 dataset show that the improved model achieves a mAP@0.5 of 51.0% and a mAP@0.5:0.95 of 31.5%, outperforming the baseline RT-DETR by 3.1 and 2.2 percentage points, respectively. To evaluate generalization, the model is further tested on the UAVDT dataset, where it demonstrates improved accuracy across various traffic categories (car, truck, bus), achieving a mAP@0.5 of 34.4%, confirming the effectiveness and generalizability of the proposed method in multi-scenario small object detection tasks.

Key words: small object detection, UAV imagery, RT-DETR, feature enhancement, feature interaction, multi-scale feature fusion