Underwater Object Detection Combining Target Feature Enhancement and Semantic-Location Path Aggregation

doi:10.3778/j.issn.1002-8331.2501-0158

Abstract

Abstract: To address the issues of missed and false detections caused by poor underwater image quality, multi-scale targets, and severe occlusion, a novel underwater object detection (UOD) model is proposed. Based on the RT-DETR framework, the proposed UOD model introduces a multi-scale injection for edge features module (MSI-Edge) to inject edge information into the deep network, enhancing the model’s perception of small objects. Additionally, a global-local feature enhancement module (GLF-Enhance) is proposed to replace the traditional multi-head self-attention mechanism in the encoder, improving the learning of global and local object information while accelerating inference. Furthermore, a new semantic-location path aggregation network (SL-PAN) is designed to address the degradation of information transmission during multi-scale feature fusion. SL-PAN utilizes high-level features as weights to guide semantic information learning in low-level features and low-level features as weights to guide positional information learning in high-level features. Experiments on public underwater datasets demonstrate that the proposed model outperforms the baseline model RT-DETR (with ResNet50 as the backbone), achieving approximately 3.2, 3.0, and 2.7 percentage points improvements in AP, AP50, and AP75 metrics on the URPC dataset, and 2.9, 2.7, and 3.0 percentage points improvements on the DUO dataset. The proposed method also effectively reduces false positive and missed detection rates. Ablation studies validate the effectiveness of each module. Compared to mainstream object detectors and the latest underwater object detection methods, the proposed model achieves competitive overall performance.

Key words: underwater object detection, semantic-location path aggregation network, multi-scale injection for edge features, RT-DETR model, global-local feature enhance

摘要： 针对水下图像质量差、目标多尺度和严重遮挡导致的漏检和误检等问题，提出一种结合目标信息增强与语义-位置路径聚合的水下目标检测模型。该模型以RT-DETR框架为基础，提出了边缘特征多尺度注入模块（multi-scale injection for edge features module，MSI-Edge），将边缘信息注入深层网络中，强化了模型对小目标的感知能力；同时，提出了全局-局部特征增强模块（global-local feature enhancement module，GLF-Enhance）来替代编码器中的传统多头自注意力机制，增强对目标全局和局部信息的学习能力，并加速模型推理；进而，设计了一种新的结合语义-位置路径聚合网络（semantic-location path aggregation network，SL-PAN），利用高层特征作为权重来指导低层特征中的语义信息学习，再使用低层特征作为权重来指导高层特征中的位置信息学习，从而有效缓解多尺度特征融合过程中信息传递退化的问题。在公开水下数据集上进行实验验证，相较基准模型RT-DETR（ResNet50主干网络），在URPC数据集上AP、AP50、AP75指标分别提升了约3.2、3.0和2.7个百分点；在DUO数据集上分别提升了2.9、2.7、3.0个百分点，同时有效降低了误检和漏检率。消融实验验证了各模块的有效性。整体性能与主流目标检测器及最新水下目标检测器相比，达到了较好水平。

关键词: 水下目标检测, 语义-位置路径聚合网络, 边缘特征多尺度注入, RT-DETR模型, 全局-局部特征增强

SONG Wei, NI Zhou, LIANG Jichen, ZHANG Minghua, WANG Jian. Underwater Object Detection Combining Target Feature Enhancement and Semantic-Location Path Aggregation[J]. Computer Engineering and Applications, 2025, 61(15): 93-110.

宋巍, 倪舟, 梁纪辰, 张明华, 王建. 结合目标特征增强与语义-位置路径聚合的水下目标检测[J]. 计算机工程与应用, 2025, 61(15): 93-110.

References

[1] 张阳婷, 黄德启, 王东伟, 等. 基于深度学习的目标检测算法研究与应用综述[J]. 计算机工程与应用, 2023, 59(18): 1-13.
ZHANG Y T, HUANG D Q, WANG D W, et al. Review on research and application of deep learning-based target detection algorithms[J]. Computer Engineering and Applications, 2023, 59(18): 1-13.
[2] 赵永强, 饶元, 董世鹏, 等. 深度学习目标检测方法综述[J]. 中国图象图形学报, 2020, 25(4): 629-654.
ZHAO Y Q, RAO Y, DONG S P, et al. Survey on deep learning object detection[J]. Journal of Image and Graphics, 2020, 25(4): 629-654.
[3] QU Z, GAO L Y, WANG S Y, et al. An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network[J]. Image and Vision Computing, 2022, 125: 104518.
[4] CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6154-6162.
[5] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[6] ZHAO Y A, LV W Y, XU S L, et al. DETRs beat YOLOs on real-time object detection[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 16965-16974.
[7] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944.
[8] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768.
[9] HU K, LU F Y, LU M X, et al. A marine object detection algorithm based on SSD and feature enhancement[J]. Complexity, 2020, 2020(1): 5476142.
[10] CHEN L, LIU Z H, TONG L, et al. Underwater object dete-ction using invert multi-class Adaboost with deep learning[C]//Proceedings of the 2020 International Joint Conference on Neural Networks. Piscataway: IEEE, 2020: 1-8.
[11] QI S H, DU J J, WU M Y, et al. Underwater small target dete-ction based on deformable convolutional pyramid[C]//Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 2784-2788.
[12] ZHANG M H, XU S B, SONG W, et al. Lightweight underwater object detection based on YOLOv4 and multi-scale attentional feature fusion[J]. Remote Sensing, 2021, 13(22): 4706.
[13] ZHOU J C, HE Z X, LAM K M, et al. AMSP-UOD: when vortex convolution and stochastic perturbation meet underwater object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 7659-7667.
[14] 陶洋, 朱腾, 钟邦乾, 等. RepViTS-YOLOX: 水下模糊及遮挡目标检测方法[J]. 计算机工程与应用, 2024, 60(13): 200-208.
TAO Y, ZHU T, ZHONG B Q, et al. RepViTS-YOLOX: underwater blurred and occluded target detection method[J]. Computer Engineering and Applications, 2024, 60(13): 200-208.
[15] 钱晓琪, 刘伟峰, 张敬, 等. 面向水下图像目标检测的退化特征增强算法[J]. 中国图象图形学报, 2022, 27(11): 3185-3198.
QIAN X Q, LIU W F, ZHANG J, et al. Underwater-relevant image object detection based feature-degraded enhancement method[J]. Journal of Image and Graphics, 2022, 27(11): 3185-3198.
[16] FU C P, FAN X, XIAO J W, et al. Learning heavily-degraded prior for underwater object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(11): 6887-6896.
[17] KATAYAMA T, SONG T, SHIMAMOTO T, et al. GAN-based color correction for underwater object detection[C]//Proceedings of the OCEANS 2019 MTS/IEEE SEATTLE. Piscataway: IEEE, 2019: 1-4.
[18] GUO T, WEI Y, SHAO H, et al. Research on underwater target detection method based on improved MSRCP and YOLOv3[C]//Proceedings of the 2021 IEEE International Conference on Mechatronics and Automation (ICMA). Piscataway: IEEE, 2021: 1158-1163.
[19] XU S B, ZHANG M H, SONG W, et al. A systematic review and analysis of deep learning-based underwater object dete-ction[J]. Neurocomputing, 2023, 527: 204-232.
[20] ZHANG M H, XU S B, SONG W, et al. Lightweight underwater object detection based on YOLOv4 and multi-scale attentional feature fusion[J]. Remote Sensing, 2021, 13(22): 4706.
[21] HE J J, WANG Y C, WANG Y T, et al. A lightweight road crack detection algorithm based on improved YOLOv7 model[J]. Signal, Image and Video Processing, 2024, 18(1): 847-860.
[22] WANG S, XIA C L, LV F, et al. RT-DETRv3: real-time end-to-end object detection with hierarchical dense positive supervision[J]. arXiv:2409.08475, 2024.
[23] ISLAM M A, JIA S, BRUCE N D B. How much position information do convolutional neural networks encode? [J]. arXiv: 2001.08248, 2020.
[24] PAN Z, CAI J, ZHUANG B. Fast vision transformers with hilo attention[C]//Advances in Neural Information Processing Systems, 2022: 14541-14554.
[25] CHEN Y F, ZHANG C Y, CHEN B, et al. Accurate leukocyte detection based on deformable-DETR and multi-level feature fusion for aiding diagnosis of blood diseases[J]. Computers in Biology and Medicine, 2024, 170: 107917.
[26] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
[27] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[28] LIU C W, LI H J, WANG S C, et al. A dataset and benchmark of underwater object detection for robot picking[C]//Proceedings of the 2021 IEEE International Conference on Multimedia & Expo Workshops. Piscataway: IEEE, 2021: 1-6.
[29] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007.
[30] TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection[EB/OL]. [2025-01-05]. https://arXiv.org/abs/1904.01355.
[31] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[32] JOCHER G. YOLOv5 in PyTorch [EB/OL]. [2025-01-05]. https://github.com/ultralytics/yolov5.
[33] JOCHER G. Ultralytics/YOLOv8 in PyTorch [EB/OL]. [2025-01-05]. https://github.com/ultralytics/ultralytics.
[34] GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. [2025-01-05]. https://arXiv.org/abs/2107.08430.
[35] JOCHER G. Ultralytics/YOLOv11 in PyTorch [EB/OL]. [2025-01-05]. https://github.com/ultralytics/ultralytics.
[36] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: To-wards real-time object detection with region proposal net-works[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[37] CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6154-6162.
[38] LU X, LI B, YUE Y, et al. Grid R-CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7363-7372.
[39] ZHU X Z, SU W J, LU L W, et al. Deformable DETR: defo-rmable transformers for end-to-end object detection[J]. arXiv:2010.04159, 2020.
[40] MENG D P, CHEN X K, FAN Z J, et al. Conditional DETR for fast training convergence[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 3631-3640.
[41] LIU S L, LI F, ZHANG H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[J]. arXiv:2201. 12329, 2022.
[42] ZHANG H, LI F, LIU S L, et al. DINO: detr with improved DeNoising anchor boxes for end-to-end object detection[J]. arXiv:2203.03605, 2022.
[43] HUANG S H, LU Z C, CUN X D, et al. DEIM: detr with improved matching for fast convergence[J]. arXiv:2412.04234, 2024.
[44] BOLYA D, FOLEY S, HAYS J, et al. TIDE: a general toolbox for identifying object detection errors[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 558-573.
[45] WANG C C, HE W, NIE Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[J]. arXiv: 2309.11331, 2023.
[46] TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787.
[47] PEDERSEN M, BRUSLUND HAURUM J, GADE R, et al. Detection of marine animals in a new underwater dataset with varying visibility[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019: 18-26.