Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (15): 93-110.DOI: 10.3778/j.issn.1002-8331.2501-0158

• Special Issue on Object Detection • Previous Articles     Next Articles

Underwater Object Detection Combining Target Feature Enhancement and Semantic-Location Path Aggregation

SONG Wei, NI Zhou, LIANG Jichen, ZHANG Minghua, WANG Jian   

  1. College of Information Technology, Shanghai Ocean University, Shanghai 200136, China
  • Online:2025-08-01 Published:2025-07-31

结合目标特征增强与语义-位置路径聚合的水下目标检测

宋巍,倪舟,梁纪辰,张明华,王建   

  1. 上海海洋大学 信息学院,上海 200136

Abstract: To address the issues of missed and false detections caused by poor underwater image quality, multi-scale targets, and severe occlusion, a novel underwater object detection (UOD) model is proposed. Based on the RT-DETR framework, the proposed UOD model introduces a multi-scale injection for edge features module (MSI-Edge) to inject edge information into the deep network, enhancing the model’s perception of small objects. Additionally, a global-local feature enhancement module (GLF-Enhance) is proposed to replace the traditional multi-head self-attention mechanism in the encoder, improving the learning of global and local object information while accelerating inference. Furthermore, a new semantic-location path aggregation network (SL-PAN) is designed to address the degradation of information transmission during multi-scale feature fusion. SL-PAN utilizes high-level features as weights to guide semantic information learning in low-level features and low-level features as weights to guide positional information learning in high-level features. Experiments on public underwater datasets demonstrate that the proposed model outperforms the baseline model RT-DETR (with ResNet50 as the backbone), achieving approximately 3.2, 3.0, and 2.7 percentage points improvements in AP, AP50, and AP75 metrics on the URPC dataset, and 2.9, 2.7, and 3.0 percentage points improvements on the DUO dataset. The proposed method also effectively reduces false positive and missed detection rates. Ablation studies validate the effectiveness of each module. Compared to mainstream object detectors and the latest underwater object detection methods, the proposed model achieves competitive overall performance.

Key words: underwater object detection, semantic-location path aggregation network, multi-scale injection for edge features, RT-DETR model, global-local feature enhance

摘要: 针对水下图像质量差、目标多尺度和严重遮挡导致的漏检和误检等问题,提出一种结合目标信息增强与语义-位置路径聚合的水下目标检测模型。该模型以RT-DETR框架为基础,提出了边缘特征多尺度注入模块(multi-scale injection for edge features module,MSI-Edge),将边缘信息注入深层网络中,强化了模型对小目标的感知能力;同时,提出了全局-局部特征增强模块(global-local feature enhancement module,GLF-Enhance)来替代编码器中的传统多头自注意力机制,增强对目标全局和局部信息的学习能力,并加速模型推理;进而,设计了一种新的结合语义-位置路径聚合网络(semantic-location path aggregation network,SL-PAN),利用高层特征作为权重来指导低层特征中的语义信息学习,再使用低层特征作为权重来指导高层特征中的位置信息学习,从而有效缓解多尺度特征融合过程中信息传递退化的问题。在公开水下数据集上进行实验验证,相较基准模型RT-DETR(ResNet50主干网络),在URPC数据集上AP、AP50、AP75指标分别提升了约3.2、3.0和2.7个百分点;在DUO数据集上分别提升了2.9、2.7、3.0个百分点,同时有效降低了误检和漏检率。消融实验验证了各模块的有效性。整体性能与主流目标检测器及最新水下目标检测器相比,达到了较好水平。

关键词: 水下目标检测, 语义-位置路径聚合网络, 边缘特征多尺度注入, RT-DETR模型, 全局-局部特征增强