YOLO-sea：改进YOLOv7-tiny的复杂海底目标检测算法研究

doi:10.3778/j.issn.1002-8331.2407-0233

摘要/Abstract

摘要： 海底成像质量差、分辨率低导致目标边缘模糊、识别困难，小目标的聚集又增加了漏检和误检的风险。针对这些问题，考虑到YOLOv7-tiny算法兼顾高精确度和小体积的特点，在其基础上设计了YOLO-sea网络检测算法。针对低分辨率场景小目标的特征学习不足、细粒度信息易丢失的问题，基于SPDConv（space-to-depth convolution）改进主干网络，提高低分辨率场景下密集小目标特征的提取能力。针对海底成像模糊、目标边缘识别困难的问题，设计了参数共享对比度增强注意力机制（parameter shared contrast enhanced attention，PSCEA）来优化局部细节和边缘信息的表示。基于YOLOv9的GELAN架构和DSConv（dynamic snake convolution）的思想，设计高效的聚合模块DSCELAN，轻量化同时增强对海底海参、鱼类等细长目标的聚焦能力。重构检测头，进一步提升小目标的检测效果。改进后的模型YOLO-sea算法在DUO数据集上的mAP提升了2.8个百分点，参数量减少了41%，证明了该创新在海底检测方面的优势。在主流网络YOLOv5s、YOLOv7-tiny和YOLOv8n上均进行注意力对比实验，加入PSCEA机制后使mAP分别提高了1.1、1.3和0.7个百分点，证明了该机制的泛化性和有效性。

关键词: 海底检测, YOLO, GELAN, DSConv, 深度学习

Abstract: The poor quality and low resolution of seabed imaging lead to blurred target edges and difficulty in identification. The aggregation of small targets increases the risk of missed detection and false detection. To address these problems, the YOLO-sea network detection algorithm is designed based on the YOLOv7-tiny algorithm, which combines high accuracy with small size. Firstly, to address the problems of insufficient feature learning of small targets in low-resolution scenes and easy loss of fine-grained information, the backbone network is redesigned based on SPDConv (space-to-depth convolution) to improve the ability to extract features of dense small targets in low-resolution scenes. Secondly, to address the problems of blurred seabed imaging and difficulty in identifying target edges, a parameter-shared contrast enhanced attention mechanism (PSCEA) is designed to optimize the representation of local details and edge information. Thirdly, based on the GELAN architecture of YOLOv9 and the idea of DSConv (dynamic snake convolution), an efficient aggregation module DSCELAN (DSC-GELAN) is designed to reduce the weight while enhancing the focusing ability on slender targets such as sea cucumbers and fish on the seabed. Finally, the detection head is reconstructed to further improve the detection effect of small targets. The improved model YOLO-sea algorithm has improved mAP by 2.8 percentage points on the DUO dataset and has reduced the number of parameters by 41%, proving the advantages of this innovation in seabed detection. In addition, attention comparison experiments are conducted on the mainstream networks YOLOv5s, YOLOv7-tiny and YOLOv8n. After adding the PSCEA mechanism, the mAP has increased by 1.1, 1.3 and 0.7 percentage points respectively, proving the generalization and effectiveness of the mechanism.

Key words: seabed resource monitoring, YOLO, GELAN, DSConv, deep learning

李润东, 曲英伟, 殷丽凤, 郑广海. YOLO-sea：改进YOLOv7-tiny的复杂海底目标检测算法研究[J]. 计算机工程与应用, 2025, 61(2): 247-258.

LI Rundong, QU Yingwei, YIN Lifeng, ZHENG Guanghai. YOLO-sea：Improved Complex Undersea Target Detection Algorithm for YOLOv7-tiny[J]. Computer Engineering and Applications, 2025, 61(2): 247-258.

参考文献

[1] GHAFOOR H, NOH Y. An overview of next-generation underwater target detection and tracking: an integrated underwater architecture[J]. IEEE Access, 2019, 7: 98841-98853.
[2] LIU K, LIANG Y. Enhancement of underwater optical images based on background light estimation and improved adaptive transmission fusion[J]. Optics Express, 2021, 29(18): 28307-28328.
[3] ZHANG W, SUN W. Research on small moving target detection algorithm based on complex scene[J]. Journal of Physics: Conference Series, 2021, 1738: 12093.
[4] FU H, SONG G, WANG Y. Improved YOLOv4 marine target detection combined with CBAM[J]. Symmetry, 2021, 13(4): 623.
[5] SAMANTARAY S, DEOTALE R, CHOWDHARY C L. Lane detection using sliding window for intelligent ground vehicle challenge[C]//Proceedings of the Innovative Data Communication Technologies and Application, 2021: 871-881.
[6] BAKHEET S, AL-HAMADI A. A framework for instantaneous driver drowsiness detection based on improved HOG features and na?ve Bayesian classification[J]. Brain Sciences, 2021, 11(2): 240.
[7] BELLAVIA F. SIFT matching by context exposed[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022, 45: 2445-2457.
[8] KOKLU M, UNLERSEN M F, OZKAN I A, et al. A CNN-SVM study based on selected deep features for grapevine leaves classification[J]. Measurement, 2022, 188: 110425.
[9] SEVIN? E. An empowered AdaBoost algorithm implementation: a COVID-19 dataset study[J]. Computers & Industrial Engineering, 2022, 165: 107912.
[10] VILLON S, MOUILLOT D, CHAUMONT M, et al. A deep learning method for accurate and fast identification of coral reef fishes in underwater images[J]. Ecological Informatics, 2018, 48: 238-244.
[11] GUO X, ZHAO X, LIU Y, et al. Underwater sea cucumber identification via deep residual networks[J]. Information Processing in Agriculture, 2019, 6(3): 307-315.
[12] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1440-1448.
[13] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[14] LIU Y, WANG S. A quantitative detection algorithm based on improved faster R-CNN for marine benthos[J]. Ecological Informatics, 2021, 61: 101228.
[15] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21-37.
[16] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[17] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[18] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[19] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[20] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[21] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[22] WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information[J]. arXiv:2402.13616, 2024.
[23] DAI L, LIU H, SONG P, et al. A gated cross-domain collaborative network for underwater object detection[J]. Pattern Recognition, 2024, 149: 110222.
[24] MUKSIT A A, HASAN F, EMON M F H B, et al. YOLO-Fish: a robust fish detection model to detect fish in realistic underwater environment[J]. Ecological Informatics, 2022, 72: 101847.
[25] LIU Z, WANG B, LI Y, et al. UnitModule: a light-weight joint image enhancement module for underwater object detection[J]. Pattern Recognition, 2024, 151: 110435.
[26] LEI F, TANG F, LI S. Underwater target detection algorithm based on improved YOLOv5[J]. Journal of Marine Science and Engineering, 2022, 10(3): 310.
[27] LIU K, SUN Q, SUN D, et al. Underwater target detection based on improved YOLOv7[J]. Journal of Marine Science and Engineering, 2023, 11(3): 677.
[28] HAN Y L, CHEN L, LUO Y, et al. Underwater Holothurian target-detection algorithm based on improved CenterNet and scene feature fusion[J]. Sensors, 2022, 22: 7204.
[29] SUNKARA R, LUO T. No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects[C]//Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2022: 443-459.
[30] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision, 2018: 3-19.
[31] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1251-1258.
[32] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 390-391.
[33] WANG C Y, LIAO H Y M, YE I H. Designing network design strategies through gradient path analysis[J]. arXiv:2211.04800, 2022.
[34] QI Y, HE Y, QI X, et al. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 6070-6079.
[35] LIU C, LI H, WANG S, et al. A dataset and benchmark of underwater object detection for robot picking[C]//Proceedings of the 2021 IEEE International Conference on Multimedia & Expo Workshops, 2021: 1-6.
[36] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11534-11542.
[37] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[38] 辛世澳, 葛海波, 袁昊, 等. 改进YOLOv7的轻量化水下目标检测算法[J]. 计算机工程与应用, 2024, 60(3), 88-99.
XIN S A, GE H B, YUAN H, et al. Improved YOLOv7’s lightweight underwater target detection algorithm[J]. Computer Engineering and Applications, 2024, 60(3): 88-99.
[39] 陶洋, 朱腾, 钟邦乾, 等. RepViTS-YOLOX: 水下模糊及遮挡目标检测方法[J]. 计算机工程与应用, 2024, 60(3): 200-208.
TAO Y, ZHU T, ZHONG B Q, et al. RepViTS-YOLOX: underwater blurred and occluded target detection method[J]. Computer Engineering and Applications, 2024, 60(3): 200-208.

编辑推荐

Metrics

阅读次数

全文

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	65

	来源	本网站

	次数	65
	比例	100%

摘要

最新录用	在线预览	正式出版

0	0	67

	来源	本网站

	次数	67
	比例	100%