改进RT-DETR的无人机图像目标检测算法

doi:10.3778/j.issn.1002-8331.2405-0331

摘要/Abstract

摘要： 针对轻小型无人机图像目标检测中由于目标灵活多样、环境复杂多变导致的检测精度低等问题，提出基于改进RT-DETR无人机目标检测算法。综合考虑轻量级SimAM注意力和倒置残差模块改进ResNet-r18主干网络，提高目标检测模型的特征提取能力。采用级联分组注意力机制优化倒置残差模块和特征交互模块，提升特征选择能力，实现目标检测信息的精细化获取。颈部网络中引入160×160检测层，提升特征融合阶段小目标的感知能力。基于VisDrone2019数据集的实验结果表明，改进后的模型具有更低的参数量和更高的检测精度。在Alver_Lab_Ulastirma和HIT-UAV数据集上进一步验证了改进方法的有效性和鲁棒性。

关键词: 小目标检测, DETR, 注意力机制, Transformer, 残差链接

Abstract: This paper proposes an improved RT-DETR algorithm for unmanned aerial vehicle (UAV) target detection in light and small-sized UAV image targets. Addressing issues such as low detection accuracy due to the flexible and diverse nature of targets and complex and variable environments, the proposed method enhances the feature extraction capability of the detection model by integrating lightweight SimAM attention and inverted residual modules into the ResNet-r18 backbone network. Furthermore, a cascaded group attention mechanism is employed to optimize the inverted residual modules and feature interaction modules, improving feature selection capability and achieving refined acquisition of target detection information. Additionally, a 160×160 detection layer is introduced in the neck network to enhance the perception capability of small targets during the feature fusion stage. Finally, the experimental results based on the VisDrone2019 dataset show that the improved model has lower number of parameters and higher detection accuracy. Further experiments on the Alver_Lab_Ulastirma and HIT-UAV datasets validate the effectiveness and robustness of the proposed improvements.

Key words: small target detection, detection Transformer (DETR), attention mechanism, Transformer, residual link

姜贸翔, 司占军, 王晓喆. 改进RT-DETR的无人机图像目标检测算法[J]. 计算机工程与应用, 2025, 61(1): 98-108.

JIANG Maoxiang, SI Zhanjun, WANG Xiaozhe. Improved Target Detection Algorithm for UAV Images with RT-DETR[J]. Computer Engineering and Applications, 2025, 61(1): 98-108.

参考文献

[1] CHENG N, WU S, WANG X, et al. AI for UAV-assisted IoT applications: a comprehensive review[J]. IEEE Internet of Things Journal, 2023, 10(16): 14438-14461.
[2] AL-LQUBAYDHI N, ALENEZI A, ALANAZI T, et al. Deep learning for unmanned aerial vehicles detection: a review[J]. Computer Science Review, 2024, 51: 100614.
[3] WU H, ZHU Y, LI S. CDYL for infrared and visible light image dense small object detection[J]. Scientific Reports, 2024, 14(1): 3510.
[4] 肖粲俊, 潘睿志, 李超, 等. 基于改进 YOLOv5s 绝缘子缺陷检测技术研究[J]. 电子测量技术, 2022, 45(24): 137-144.
XIAO C J, PAN R Z, LI C, et al. Research on defect detection technology based on improved YOLOv5s insulator[J]. Electronic Measurement Technology, 2022, 45(24): 137-144.
[5] 李晓欢, 霍科辛, 颜晓凤, 等. 基于特征加权视觉增强的雷视融合车辆检测方法[J]. 公路交通科技, 2023, 40(2): 182-189.
LI X H, HO K X, YAN X F, et al. Thunder-vision fusion vehicle detection method based on feature-weighted visual enhancement [J]. Highway Traffic Technology, 2023, 40(2): 182-189.
[6] 郎磊, 刘宽, 王东. 基于 YOLOX-Tiny 的轻量级遥感图像目标检测模型[J]. 激光与光电子学进展, 2023, 60(2): 353-363.
LANG L, LIU K, WANG D. Lightweight remote sensing object detector based on YOLOX-Tiny[J]. Laser & Optoelectronics Progress, 2023, 60(2): 353-363.
[7] WANG F, WANG H, QIN Z, et al. UAV target detection algorithm based on improved YOLOv8[J]. IEEE Access, 2023, 11: 116534-116544.
[8] DAI X, CHEN Y, YANG J, et al. Dynamic DETR: end-to-end object detection with dynamic attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2988-2997.
[9] HAN K, WANG Y, TIAN Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1580-1589.
[10] ZHU X, SU W, LU L, et al. Deformable DETR: deformable transformers for end-to-end object detection[J]. arXiv:2010. 04159, 2020.
[11] CHEN Q, CHEN X, ZENG G, et al. Group DETR: fast training convergence with decoupled one-to-many label assignment[J]. arXiv:2207.13085, 2022.
[12] LIU S, LI F, ZHANG H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[J]. arXiv:2201. 12329, 2022.
[13] ZHANG H, LI F, LIU S, et al. DINO: DETR with improved denoising anchor boxes for end-to-end object detection[J]. arXiv:2203.03605, 2022.
[14] LV W, XU S, ZHAO Y, et al. DETRs beat YOLOs on real-time object detection[J]. arXiv:2304.08069, 2023.
[15] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[16] YANG L, ZHANG R Y, LI L, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks[C]//Proceedings of the International Conference on Machine Learning, 2021: 11863-11874.
[17] ZHANG J, LI X, LI J, et al. Rethinking mobile block for efficient attention-based models[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision, 2023: 1389-1400.
[18] LIU X, PENG H, ZHENG N, et al. EfficientViT: memory efficient vision transformer with cascaded group attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14420-14430.
[19] CAO Y, HE Z, WANG L, et al. VisDrone-DET2021: the vision meets drone object detection challenge results[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2847-2854.
[20] SUO J, WANG T, ZHANG X, et al. HIT-UAV: a high-altitude infrared thermal dataset for unmanned aerial vehicle-based object detection[J]. Scientific Data, 2023, 10(1): 227.
[21] Alver_Lab_Ulastirma Dataset. Roboflow universe[EB/OL]. [2024-04-29]. https://universe.roboflow.com/new-0ikav/alver_lab_ulastirma.
[22] BOLYA D, FOLEY S, HAYS J, et al. Tide: a general toolbox for identifying object detection errors[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, Aug 23-28, 2020. Cham: Springer International Publishing, 2020: 558-573.
[23] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[24] WU W, LIU H, LI L, et al. Application of local fully convolutional neural network combined with YOLO v5 algorithm in small target detection of remote sensing image[J]. PloS One, 2021, 16(10): e0259283.
[25] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[26] 潘玮, 韦超, 钱春雨, 等. 面向无人机视角下小目标检测的YOLOv8s改进模型[J]. 计算机工程与应用, 2024, 60(9): 142-150.
PAN W, WEI C, QIAN C Y, et al. Improved YOLOv8s model for small object detection from perspective of drones[J]. Computer Engineering and Applications, 2024, 60(9): 142-150.
[27] SEO D M, WOO H J, KIM M S, et al. Identification of asbestos slates in buildings based on faster region-based convolutional neural network (Faster R-CNN) and drone-based aerial imagery[J]. Drones, 2022, 6(8): 194.
[28] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, Oct 11-14, 2016. Cham: Springer International Publishing, 2016: 21-37.
[29] 陈佳慧, 王晓虹. 改进YOLOv5的无人机航拍图像密集小目标检测算法[J]. 计算机工程与应用, 2024, 60(3): 100-108.
CHEN J H, WANG X H. Dense small object detection algorithm based on improved YOLOv5 in UAV aerial images [J]. Computer Engineering and Applications, 2024, 60(3): 100-108.
[30] ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2778-2788.