计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (11): 272-283.DOI: 10.3778/j.issn.1002-8331.2402-0026

• 图形图像处理 • 上一篇    下一篇

采用多信息残差融合和多尺度特征表达的水下目标检测

付均尚,田莹   

  1. 辽宁科技大学 计算机与软件工程学院,辽宁 鞍山 114000
  • 出版日期:2025-06-01 发布日期:2025-05-30

Underwater Target Detection Using Multi-Information Residual Fusion and Multi-Scale Feature Expression

FU Junshang, TIAN Ying   

  1. College of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, Liaoning 114000, China
  • Online:2025-06-01 Published:2025-05-30

摘要: 为了改善由于水下图像模糊和水下环境复杂而导致现有的水下目标检测模型效果不佳,模型检测精度低等问题,提出一种改进YOLOv7的目标检测模型YOLOv7-RMC,提高水下目标检测的性能。为了能够在模糊的图像中提取到更关键的信息设计了一种多信息残差融合注意力机制(residual fusion global attention mechanism, RGAM),解决随着网络层次的加深上下文特征信息丢失的问题,加强模型的特征信息提取能力。为解决远景近景目标信息丢失和小目标缺陷信息丢失问题,对原颈部网络结构进行优化重构,引入多尺度浅层特征融合网络(multi-scale shallow feature fusion network,MSFN)结构,通过多次深层特征融合与浅层特征之间的信息交互,提高目标检测精度。在优化后的颈部网络MSFN中引入轻量级上采样算子模块(content aware reassembly of feature,CARAFE),通过特征重组和特征扩张改善网络的特征融合能力保留更多的目标信息。实验结果表明,该算法在URPC数据集和Brackish数据集上的平均精度(mAP@0.5)达到87.4%和97.9%,相较于原始的YOLOv7网络模型mAP@0.5提高了2.3和0.9个百分点,提高了模型的检测能力,具有很强的实用性。

关键词: 水下目标检测, 特征融合, 注意力机制, YOLOv7, 计算机视觉

Abstract: Blurred underwater images and complex underwater environments lead to issues of poor performance and low detection accuracy of existing underwater target detection models. To address these issues, an improved YOLOv7 target detection model, YOLOv7-RMC, is proposed. Firstly, the residual fusion global attention mechanism (RGAM) is designed to extract more critical information from blurred images. It can address the loss of contextual feature information as the network layers deepen, thereby strengthening the model’s feature extraction capabilities. Secondly, to address the loss of information in both far and near targets, as well as the loss of defect information in small targets, the multi-scale shallow feature fusion network (MSFN) is introduced to optimize and reconstruct the original neck network structure. This network improves target detection accuracy through multiple interactions between deep and shallow features. Finally, a lightweight up-sampling operator module, the content aware reassembly of feature (CARAFE), is introduced into the optimized neck network MSFN. It enhances the network’s feature fusion capabilities and retains more target information through feature reorganization and expansion. Experimental results show that the proposed algorithm achieves mAP@0.5 of 87.4% and 97.9% on the URPC and Brackish datasets respectively. Compared to the original YOLOv7 network model, the mAP@0.5 is improved by 2.3 and 0.9 percentage points, demonstrating the improved detection capabilities and strong practicality of the proposed model.

Key words: underwater target detection, feature fusion, attention mechanism, YOLOv7, computer vision