LFDS-YOLO：多尺度特征融合的轻量化航拍路面病害检测算法

doi:10.3778/j.issn.1002-8331.2503-0397

摘要/Abstract

摘要： 现有航拍路面病害检测算法存在特征提取冗余、计算复杂度高，全局注意力计算效率低下以及卷积注意力维度信息少、语义不足的问题，导致检测精度和实时性能受限。为解决上述问题，提出了一种基于多尺度特征融合与注意力机制改进的轻量化检测网络LFDS-YOLO。通过移除大尺度特征分支重构特征金字塔结构LF_PANet（lightweight fusion path aggregation network），提出动态特征提取模块DFEB（dynamic feature extraction block），实现资源自适应分配。提出多头列区域注意力机制MHCol-Attn（multi-head column attention），结合FlashAttention加速技术，优化训练效率。提出SLCA（superior lightweight coordinate attention），提高卷积注意力多维信息的特征提取能力。采用非结构化剪枝技术压缩模型体积并提高检测速度。实验结果表明，LFDS-YOLO在UAV-PDD2023公开数据集上的平均精度较YOLOv11s提高3.5个百分点，模型参数、计算复杂度和模型大小分别降低53.2%、6.5%和52.2%，检测速度达到95?FPS，有效应用于航拍路面病害检测。

关键词: 路面病害检测, YOLOv11s, 特征融合, 注意力机制, 轻量化

Abstract: Current aerial pavement distress detection algorithms suffer from redundant feature extraction, high computational complexity, inefficient global attention mechanisms, and limited multi-dimensional feature extraction in convolutional attention, leading to constrained detection accuracy and real-time performance. To address these issues, this paper proposes LFDS-YOLO, a lightweight detection network based on multi-scale feature fusion and enhanced attention mechanisms. This paper reconstructs a feature pyramid structure (LF_PANet) by removing large-scale feature branches, designs a dynamic feature extraction block (DFEB) for adaptive resource allocation. A multi-head column attention mechanism (MHCol-Attn) is introduced, accelerated by FlashAttention to optimize training efficiency. A superior lightweight coordinate attention (SLCA) is proposed to enhance multi-dimensional feature extraction. Unstructured pruning is employed to compress model size and boost inference speed. Experimental results on the UAV-PDD2023 dataset demonstrate that LFDS-YOLO achieves a 3.5 percentage points higher mAP than YOLOv11s, while reducing parameters, computational complexity, and model size by 53.2%, 6.5%, and 52.2%, respectively, with a detection speed of 95 FPS, validating its effectiveness in aerial pavement distress detection.

Key words: pavement defect detection, YOLOv11s, feature fusion, attention mechanism, lightweight

李勇, 沈坚. LFDS-YOLO：多尺度特征融合的轻量化航拍路面病害检测算法[J]. 计算机工程与应用, 2025, 61(21): 81-93.

LI Yong, SHEN Jian. LFDS-YOLO: Lightweight Aerial Pavement Damage Detection Algorithm with Multi-Scale Feature Fusion[J]. Computer Engineering and Applications, 2025, 61(21): 81-93.

参考文献

[1] AKILA DEVI M P, LATHA T, SULOCHANA C H. Iterative thresholding based image segmentation using 2D improved Otsu algorithm[C]//Proceedings of the 2015 Global Conference on Communication Technologies. Piscataway: IEEE, 2015: 145-149.
[2] 徐欢, 李振璧, 姜媛媛, 等. 基于OpenCV和改进Canny算子的路面裂缝检测[J]. 计算机工程与设计, 2014, 35(12): 4254-4258.
XU H, LI Z B, JIANG Y Y, et al. Pavement crack detection based on OpenCV and improved Canny operator[J]. Computer Engineering and Design, 2014, 35(12): 4254-4258.
[3] 胥铁峰, 黄河, 张红民, 等. 基于改进YOLOv8的轻量化道路病害检测方法[J]. 计算机工程与应用, 2024, 60(14): 175-186.
XU T F, HUANG H, ZHANG H M, et al. Lightweight road damage detection method based on improved YOLOv8[J]. Computer Engineering and Applications, 2024, 60(14): 175-186.
[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587.
[5] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[6] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[7] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788.
[8] ANDIKA F, BANDUNG Y. Road damage classification using SSD mobilenet with image enhancement[C]//Proceedings of the 2023 International Conference on Computer Science, Information Technology and Engineering. Piscataway: IEEE, 2023: 540-545.
[9] WANG G, CHEN Y F, AN P, et al. UAV-YOLOv8: a small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios[J]. Sensors, 2023, 23(16): 7190.
[10] ZHANG Z X, LU X Q, CAO G J, et al. ViT-YOLO: transformer-based YOLO for object detection[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2021: 2799-2808.
[11] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[12] ZHANG H X, LIU K, GAN Z X, et al. UAV-DETR: efficient end-to-end object detection for unmanned aerial vehicle imagery[J]. arXiv:2501.01855, 2025.
[13] ZHAO Y A, LV W Y, XU S L, et al. DETRs beat YOLOs on real-time object detection[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 16965-16974.
[14] 赵磊, 李栋. PMM-YOLO: 多尺度特征融合的交通标志检测算法[J]. 计算机工程与应用, 2025, 61(4): 262-271.
ZHAO L, LI D. PMM-YOLO: traffic sign detection algorithm with multi-scale feature fusion[J]. Computer Engineering and Applications, 2025, 61(4): 262-271.
[15] 高翊轩, 李昕, 刘婧彤. 改进YOLOv5的小目标交通标志检测方法[J]. 计算机工程与设计, 2024, 45(12): 3639-3647.
GAO Y X, LI X, LIU J T. Improved YOLOv5 small target traffic sign detection method[J]. Computer Engineering and Design, 2024, 45(12): 3639-3647.
[16] KHANAM R, HUSSAIN M. YOLOv11: an overview of the key architectural enhancements[J]. arXiv:2410.17725, 2024.
[17] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768.
[18] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2020: 1571-1580.
[19] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475.
[20] HO J, KALCHBRENNER N, WEISSENBORN D, et al. Axial attention in multidimensional transformers[J]. arXiv:1912.12180, 2019.
[21] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2022: 9992-10002.
[22] TIAN Y J, YE Q X, DOERMANN D. YOLOv12: attention-centric real-time object detectors[J]. arXiv:2502.12524, 2025.
[23] DAO T, FU D Y, ERMON S, et al. FlashAttention: fast and memory-efficient exact attention with IO-awareness[J]. arXiv:2205.14135, 2022.
[24] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13708-13717.
[25] HE A, LI X B, WU X M, et al. ALSS-YOLO: an adaptive lightweight channel split and shuffling network for TIR wildlife detection in UAV imagery[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 17308-17326.
[26] HAN S, MAO H Z, DALLY W J. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding[J]. arXiv:1510.00149, 2015.
[27] YAN H H, ZHANG J F. UAV-PDD2023: a benchmark dataset for pavement distress detection based on UAV images[J]. Data in Brief, 2023, 51: 109692.
[28] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[29] WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11531-11539.
[30] WANG A, CHEN H, LIU L H, et al. YOLOv10: real-time end-to-end object detection[J]. arXiv:2405.14458, 2024.
[31] WANG Z Y, LI C, XU H Y, et al. Mamba YOLO: a simple baseline for object detection with state space model[J]. arXiv:2406.05835, 2024.
[32] FENG Y F, HUANG J G, DU S Y, et al. Hyper-YOLO: when visual object detection meets hypergraph computation[J]. arXiv:2408.04804, 2024.
[33] LEI M Q, LI S Q, WU Y H, et al. YOLOv13: real-time object detection with hypergraph-enhanced adaptive visual perception[J]. arXiv:2506.17733, 2025.
[34] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359.
[35] ARYA D, MAEDA H, GHOSH S K, et al. RDD2022: a multi-national image dataset for automatic road damage detection[J]. arXiv:2209.08538, 2022.