改进RT-DETR的航拍小目标检测算法

doi:10.3778/j.issn.1002-8331.2407-0399

摘要/Abstract

摘要： 针对现有的目标检测算法在航拍图像中的小目标上易出现的漏检和误检问题，提出了基于改进RT-DETR（real-time detection transformer）的算法。在主干网络中引入了部分卷积（partial convolution，PConv），设计了PConvBlock结构，并通过由PConvBlock组成的BasicBlock-PConvBlock模块替代原有BasicBlock，有效减少了模型参数。采用双向特征金字塔网络（bidirectional feature pyramid network，BiFPN）结构优化特征融合模块，并引入S2特征进一步提升小目标的检测能力。引入CARAFE上采样算子，增强了多尺度特征的快速融合。实验表明，在VisDrone测试集上，改进后的模型在参数量上比RT-DETR模型降低了13.9%，同时在mAP0.5和mAP0.5：0.95指标上分别提升了2.4和1.9个百分点。在TT100K和DOTA数据集上均优于RT-DETR算法。改进模型在保持较小参数量和计算量的同时，提高了检测精度，满足了无人机航拍图像实时检测的应用需求。

关键词: 小目标检测, 轻量化, RT-DETR, 部分卷积

Abstract: Aiming to address the issue of missed and false detection of small objects in aerial photography images by existing object detection algorithms, an improved algorithm based on RT-DETR (real-time detection transformer) is proposed. Partial convolution (PConv) is introduced into the backbone network, and a PConvBlock structure is designed. Then, a BasicBlock-PConvBlock module composed of PConvBlocks replaces the original BasicBlock, effectively reducing the number of model parameters. The bidirectional feature pyramid network (BiFPN) structure is adopted to optimize the feature fusion module. The S2 feature is introduced to enhance the detection ability of small objects. The CARAFE upsampling operator is introduced to strengthen the fast fusion of multi-scale features. Experimental results show that the improved model has a 13.9% reduction in parameter number compared to the RT-DETR model, and the mAP0.5 and mAP0.5：0.95 indicators are improved by 2.4 and 1.9 percentage points, respectively on the VisDrone test set. On the TT100K and DOTA datasets, the improved model outperforms the RT-DETR algorithm. The improved model significantly enhances detection accuracy while maintaining a smaller parameter number and computational cost, meeting the real-time detection application requirements for drone aerial photography images.

Key words: small object detection, lightweight, RT-DETR, partial convolution

刘思元, 高凯, 雍龙泉. 改进RT-DETR的航拍小目标检测算法[J]. 计算机工程与应用, 2025, 61(4): 272-281.

LIU Siyuan, GAO Kai, YONG Longquan. Improved RT-DETR Algorithm for Aerial Small Object Detection[J]. Computer Engineering and Applications, 2025, 61(4): 272-281.

参考文献

[1] 谢椿辉, 吴金明, 徐怀宇. 改进YOLOv5的无人机影像小目标检测算法[J]. 计算机工程与应用, 2023, 59(9): 198-206.
XIE C H, WU J M, XU H Y. Small object detection algorithm based on improved YOLOv5 in UAV image[J]. Computer Engineering and Applications, 2023, 59(9): 198-206.
[2] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587.
[3] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1440-1448.
[4] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[5] HE K, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988.
[6] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[7] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788.
[8] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.
02976, 2022.
[9] GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[10] WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475.
[11] WANG C Y, YEH I H, LIAO H M. YOLOv9: learning what you want to learn using programmable gradient information[J]. arXiv:2402.13616, 2024.
[12] WANG A, CHEN H, LIU L, et al. YOLOv10: real-time end-to-end object detection[J]. arXiv:2405.14458, 2024.
[13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[14] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[15] ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection[J]. arXiv:2304.08069, 2023.
[16] SUN F, HE N, LI R, et al. GD-PAN: a multiscale fusion architecture applied to object detection in UAV aerial images[J]. Multimedia Systems, 2024, 30(3): 143-155.
[17] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768.
[18] WANG C, HE W, NIE Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[C]//Advances in Neural Information Processing Systems 36, 2024: 51094-51112.
[19] MUZAMMUL M, ALGARNI A, GHADI Y Y, et al. Enhancing UAV aerial image analysis: integrating advanced SAHI techniques with real-time detection models on the VisDrone dataset[J]. IEEE Access, 2024, 12: 21621-21633.
[20] AKYON F C, ONUR ALTINUC S, TEMIZEL A. Slicing aided hyper inference and fine-tuning for small object detection[C]//Proceedings of the 2022 IEEE International Conference on Image Processing. Piscataway: IEEE, 2022: 966-970.
[21] LI D, YANG P, ZOU Y. Optimizing insulator defect detection with improved DETR models[J]. Mathematics, 2024, 12(10): 1507-1524.
[22] XIE Y, ZHENG S, LI W. Feature-guided spatial attention upsampling for real-time stereo matching network[J]. IEEE MultiMedia, 2021, 28(1): 38-47.
[23] 张储, 徐伟悦, 杨如雪, 等.一种基于优化后的RT-DETR模型的红花目标检测方法和装置: 202410039910[P]. 2024-04-09.
ZHANG C, XU W Y, YANG R X, et al. A method and device for detecting red flower targets based on an optimized RT-DETR model: 202410039910[P]. 2024-04-09.
[24] 李亦涵, 张秀再, 沈涛. 一种改进RT-DETR算法的遥感图像目标检测方法及系统: 202410609716[P]. 2024-06-14.
LI Y H, ZHANG X Z, SHEN T. An improved RT-DETR algorithm for remote sensing image object detection method and system: 202410609716[P]. 2024-06-14.
[25] ZHANG X, SONG Y, SONG T, et al. AKConv: convolutional kernel with arbitrary sampled shapes and arbitrary number of parameters[J]. arXiv:2311.11587, 2023.
[26] 庞玉东, 李志星, 刘伟杰, 等. 基于改进实时检测Transformer的塔机上俯视场景小目标检测模型[J/OL]. 计算机应用[2024-04-07]. https://kns.cnki.net/kcms/detail/51.1307.TP.
20240402.2133.013.html.
PANG Y D, LI Z X, LIU W J, et al. Small target detection model of overhead scene on tower crane based on improved real-time detection Transformer[J/OL]. Journal of Computer Applications[2024-04-07]. https://kns.cnki.net/kcms/detail/51.1307.TP.20240402.2133.013.html.
[27] CHEN J, KAO S H, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 12021-12031.
[28] 胡佳乐, 周敏, 申飞. 面向无人机小目标的RTDETR改进检测算法[J]. 计算机工程与应用, 2024, 60(20): 198-206.
HU J L, ZHOU M, SHEN F. Improved detection algorithm of RTDETR for UAV small target[J]. Computer Engineering and Applications, 2024, 60(20): 198-206.
[29] OUYANG D, HE S, ZHANG G, et al. Efficient multi-scale attention module with cross-spatial learning[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2023: 1-5.
[30] LIU W, LU H, FU H, et al. Learning to upsample by learning to sample[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 6004-6014.
[31] KANG M, TING C M, TING F F, et al. ASF-YOLO: a novel YOLO model with attentional scale sequence fusion for cell instance segmentation[J]. arXiv:2312.06458, 2023.
[32] TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787.
[33] WANG J, CHEN K, XU R, et al. CARAFE: content-aware reassembly of features[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 3007-3016.
[34] ZHU P F, DU D W, WEN L Y, et al. VisDrone-VID2019: the vision meets drone object detection in video challenge results[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway: IEEE, 2019: 227-235.
[35] ZHU Z, LIANG D, ZHANG S, et al. Traffic-sign detection and classification in the wild[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2110-2118.
[36] XIA G S, BAI X, DING J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 3974-3983.
[37] 雷帮军, 余翱, 余快. 基于YOLOv8s改进的小目标检测算法[J]. 无线电工程, 2024, 54(4): 857-870.
LEI B J, YU A, YU K. Small object detection algorithm based on improved YOLOv8s[J]. Radio Engineering, 2024, 54(4): 857-870.
[38] 李岩超, 史卫亚, 冯灿. 面向无人机航拍小目标检测的轻量级YOLOv8检测算法[J]. 计算机工程与应用, 2024, 60(17): 167-178.
LI Y C, SHI W Y, FENG C. Lightweight YOLOv8 detection algorithm for small object detection in UAV aerial photography[J]. Computer Engineering and Applications, 2024, 60(17): 167-178.