基于等级残差与双向特征融合机制的检测算法

doi:10.3778/j.issn.1002-8331.2409-0219

摘要/Abstract

摘要： 现有的YOLO系列目标检测算法虽然在速度和实时性方面表现出色，但在处理多尺度目标和保持边界细节方面仍有不足。为解决上述问题，提出了一种基于YOLOv8改进的目标检测算法Res-YOLO。Res-YOLO包含三个核心模块：特征增强模块Res-SPPF、双向特征融合模块RSBA和动态特征选择模块C2f_ODC。其中，Res-SPPF利用等级制残差连接和多头注意力机制来增强模型的多尺度特征表达能力；RSBA采取自适应深浅层特征融合机制来保留边界细节和语义信息；C2f_ODC通过渐进式学习以逐步过滤非必要特征，从而降低模型复杂度。此外，引入线性可变卷积LDConv来处理具有复杂边界和不规则形状的目标。在MS COCO2017数据集上的实验结果表明，相比于原始算法，Res-YOLO在mAP指标上提升2.9个百分点，而GFLOPs为原始算法的94%。与其他先进检测算法的对比实验结果也证实了Res-YOLO的有效性和竞争力。

关键词: 目标检测, 残差连接, 多尺度特征融合, 卷积神经网络, 注意力机制

Abstract: Although the existing YOLO series of object detection algorithms demonstrate excellent speed and real-time performance, they still have shortcomings in handling multi-scale objects and preserving boundary details. To address these issues, an improved object detection algorithm based on YOLOv8, named Res-YOLO, is proposed. Res-YOLO consists of three core modules: the Res-SPPF for feature enhancement, the RSBA for bidirectional feature fusion, and the C2f_ODC for dynamic feature selection. Specifically, the Res-SPPF utilizes hierarchical residual connections and a multi-head attention mechanism to enhance the model’s multi-scale feature representation capability; the RSBA employs an adaptive deep-shallow level feature fusion mechanism to retain boundary details and semantic information; the C2f_ODC filters unnecessary features progressively through incremental learning, thereby reducing model complexity. Additionally, a linear deformable convolution (LDConv) is introduced to handle objects with complex boundaries and irregular shapes. Experimental results on the MS COCO 2017 dataset show that Res-YOLO achieves a 2.9 percentage points improvement in mAP over the original algorithm, while GFLOPs being 94% of the original algorithm. Comparative experiments with other state-of-the-art detection algorithms further validate the effectiveness and competitiveness of Res-YOLO.

Key words: object detection, residual connection, multi-scale feature fusion, convolutional neural networks, attention mechanism

冷强奎, 卢建旭, 孟祥福. 基于等级残差与双向特征融合机制的检测算法[J]. 计算机工程与应用, 2025, 61(19): 179-189.

LENG Qiangkui, LU Jianxu, MENG Xiangfu. Detection Algorithm Based on Hierarchical Residuals and Bidirectional Feature Fusion Mechanism[J]. Computer Engineering and Applications, 2025, 61(19): 179-189.

参考文献

[1] ZOU Z X, CHEN K Y, SHI Z W, et al. Object detection in 20 years: a survey[J]. Proceedings of the IEEE, 2023, 111(3): 257-276.
[2] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587.
[3] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1440-1448.
[4] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[5] CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6154-6162.
[6] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[7] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007.
[8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788.
[9] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525.
[10] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[11] WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475.
[12] WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information[J]. arXiv:2402.13616, 2024.
[13] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[15] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[16] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[17] TANG F L, XU Z X, HUANG Q M, et al. DuAT: dual-aggregation transformer network for medical image segmentation[C]//Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision. Singapore: Springer, 2024: 343-356.
[18] LI C, ZHOU A, YAO A. Omni-dimensional dynamic convolution[J]. arXiv:2209.07947, 2022.
[19] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2020: 1571-1580.
[20] WANG C Y, LIAO H Y M, YEH I H. Designing network design strategies through gradient path analysis[J]. arXiv:2211.04800, 2022.
[21] ZHANG X, SONG Y Z, SONG T T, et al. LDConv: linear deformable convolution for improving convolutional neural networks[J]. Image and Vision Computing, 2024, 149: 105190.
[22] WANG A, CHEN H, LIU L H, et al. YOLOv10: real-time end-to-end object detection[J]. arXiv:2405.14458, 2024.
[23] WANG C C, HE W, NIE Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[C]//Advances in Neural Information Processing Systems, 2024.
[24] GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[25] CHIEN C T, JU R Y, CHOU K Y, et al. YOLOv8-AM: YOLOv8 with attention mechanisms for pediatric wrist fracture detection[J]. arXiv:2402.09329, 2024.
[26] 胡峻峰, 李柏聪, 朱昊, 等. 改进YOLOv8的轻量化无人机目标检测算法[J]. 计算机工程与应用, 2024, 60(8): 182-191.
HU J F, LI B C, ZHU H, et al. Improved YOLOv8 lightweight UAV target detection algorithm[J]. Computer Engineering and Applications, 2024, 60(8): 182-191.
[27] BOLYA D, FOLEY S, HAYS J, et al. TIDE: a general toolbox for identifying object detection errors[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 558-573.
[28] XIE G B, XU Z J, LIN Z Y, et al. GRFS-YOLOv8: an efficient traffic sign detection algorithm based on multiscale features and enhanced path aggregation[J]. Signal, Image and Video Processing, 2024, 18(6): 5519-5534.
[29] ZHANG F F, LEONG L V, YEN K S, et al. An enhanced lightweight model for small-scale pedestrian detection based on YOLOv8s[J]. Digital Signal Processing, 2025, 156: 104866.
[30] 高德勇, 陈泰达, 缪兰. 改进YOLOv8n的道路目标检测算法[J]. 计算机工程与应用, 2024, 60(16): 186-197.
GAO D Y, CHEN T D, MIAO L. Improved road object detection algorithm for YOLOv8n[J]. Computer Engineering and Applications, 2024, 60(16): 186-197.
[31] LI Z X, HE Q H, ZHAO H, et al. Doublem-net: multi-scale spatial pyramid pooling-fast and multi-path adaptive feature pyramid network for UAV detection[J]. International Journal of Machine Learning and Cybernetics, 2024, 15(12): 5781-5805.
[32] 许德刚, 王双臣, 王再庆, 等. 改进YOLOv8算法的城市车辆目标检测[J]. 计算机工程与应用, 2024, 60(18): 136-146.
XU D G, WANG S C, WANG Z Q, et al. Improved YOLOv8 urban vehicle target detection algorithm[J]. Computer Engineering and Applications, 2024, 60(18): 136-146.