改进YOLOv5s-Seg的高效实时实例分割模型

doi:10.3778/j.issn.1002-8331.2311-0378

摘要/Abstract

摘要： 实例分割是图像分割的重要组成部分，同时也是计算机视觉领域的一个重要课题。然而现有实例分割模型不能在保证实时性的同时保证模型分割精度，因此在实时实例分割任务中一直存在精度过低、定位不精确的问题。针对此问题，提出了一种基于YOLOv5s-Seg改进的实时实例分割模型。以YOLOv5s-Seg作为网络的基础模型，主干网络选用Repvit m3网络，然后改进FPN结构，在FPN结构中将原始得到的C3卷积模块升级为RsRepVitBlock模块，并在其内部使用ECA注意力机制，最后采用SIoU作为模型的边界框损失函数。该算法在公开数据集PASCAL VOC 2012上的实验结果显示，改进后的模型分割精度mAP达到了65.7%，较原模型YOLOv5s-Seg提高了10.6个百分点。该模型大幅提升了分割精度，并且有效地改善了分割任务中定位不准确的问题。相较于其他模型，具有显著的精度优势和更好的模型稳定性。

关键词: 实时实例分割, YOLOv5s-Seg, Repvit m3, RsRepVitBlock, 高效通道注意力机制（ECA）, SIoU

Abstract: Instance segmentation is a crucial component of image segmentation and an important topic in the field of computer vision. However, existing instance segmentation models cannot guarantee segmentation accuracy while maintaining real-time performance. Consequently, the issues of low accuracy and inaccurate positioning persist in real-time instance segmentation tasks. To address the issues, this paper proposes an improved real-time instance segmentation model based on YOLOv5s-Seg. Initially, YOLOv5s-Seg serves as the fundamental model for the network, the Repvit m3 network is chosen as the backbone. Subsequently, this paper refines the FPN structure by upgrading the original C3 convolution module to the RsRepVitBlock module within the FPN structure and incorporating the ECA attention mechanism internally. Finally, this paper adopts SIoU as the bounding box loss function for the model. Experimental results on the public dataset PASCAL VOC 2012 demonstrate that the improved model achieves a segmentation accuracy of 65.7% mAP, representing a significant improvement of 10.6?percentage points compared to the original YOLOv5s-Seg model. This model significantly enhances segmentation accuracy and effectively addresses the problem of inaccurate positioning in segmentation tasks. Compared to other models, it exhibits notable accuracy advantages and superior model stability.

Key words: real-time instance segmentation, YOLOv5s-Seg, Repvit m3, RsRepVitBlock, efficient channel attention (ECA), SIoU

马冬梅, 郭智浩, 罗晓芸. 改进YOLOv5s-Seg的高效实时实例分割模型[J]. 计算机工程与应用, 2024, 60(16): 258-268.

MA Dongmei, GUO Zhihao, LUO Xiaoyun. Improved Efficient Real-Time Instance Segmentation Model Based on YOLOv5s-Seg[J]. Computer Engineering and Applications, 2024, 60(16): 258-268.

参考文献

[1] 黄涛, 李华, 周桂, 等. 实例分割方法研究综述[J]. 计算机科学与探索, 2023, 17(4): 810-816.
HUANG T, LI H, ZHOU G, et al. Survey of research on instance segmentation methods[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(4): 810-816.
[2] 张继凯, 赵君, 张然, 等. 深度学习的图像实例分割方法综述[J]. 小型微型计算机系统, 2021, 42(1): 161-171.
ZHANG J K, ZHAO J, ZHANG R et al. Survey of image instance segmentation methods using deep learning[J]. Journal of Chinese Computer Systems, 2021, 42(1): 161-171.
[3] 杨飞帆, 李军. 面向自动驾驶的YOLO目标检测算法研究综述[J]. 汽车工程师, 2023(11): 1-11.
YANG F F, LI J. Research review of YOLO target detection algorithm for autopilot [J]. Automotive Engineer, 2023(11): 1-11.
[4] ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[J]. arXiv:2108.11539, 2021.
[5] REIS D, KUPEC J, HONG J, et al. Real-time flying object detection with YOLOv8[J]. arXiv:2305.09972, 2023.
[6] WANG B, YAN Y, LAN Y, et al. Accurate detection and precision spraying of corn and weeds using the improved YOLOv5 model[J]. IEEE Access, 2023, 11: 29868-29882.
[7] JIANG T, LI C, YANG M, et al. An improved YOLOv5s algorithm for object detection with an attention mechanism[J]. Electronics, 2022, 11(16): 2494.
[8] ROHAN A, RAFAQ M S, HASAN M J, et al. Application of deep learning for livestock behaviour recognition: a systematic literature review[J]. arXiv:2310.13483, 2023.
[9] WANG A, CHEN H, LIN Z et al. RepViT: revisiting mobile CNN from ViT perspective[J]. arXiv:2307.09283, 2023.
[10] SONG Y, ELIBOL A, CHONG N Y. Abdominal multi-organ segmentation based on feature pyramid network and spatial recurrent neural network[J].arXiv:2308.15137, 2023.
[11] HUANG T, HUANG L, YOU S, et al. LightViT: towards light-weight convolution-free vision transformers[J]. arXiv:2207.05557, 2022.
[12] HE L, CHEN Y, WU K. Fuzzy granular deep convolutional network with residual structures[J]. Knowledge-Based Systems, 2022, 258: 109941.
[13] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. arXiv:1709.01507, 2017.
[14] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[J]. arXiv:1910.03151, 2019.
[15] ZHENG Z, WANG P, LIU W, et al. Distance-IoU Loss: faster and better learning for bounding box regression[J]. arXiv:1911.08287, 2019.
[16] GEVORGYAN Z. SIoU Loss: more powerful learning for bounding box regression[J]. arXiv:2205.12740, 2022.
[17] TONG K, WU Y. Rethinking PASCAL-VOC and MS-COCO dataset for small object detection[J]. Journal of Visual Communication and Image Representation, 2023, 93: 103830.
[18] DUMITRU R-G, PETELEAZA D, CRACIUN C. Using DUCK-Net for polyp image segmentation[J]. Scientific Reports, 2023, 13(1): 9803.
[19] LI J, WEN Y, HE L. SCConv: spatial and channel reconstruction convolution for feature redundancy[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 6153-6162.
[20] OUYANG D, HE S, ZHANG G, et al. Efficient multi-scale attention module with cross-spatial learning[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, 2023: 1-5.