计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (12): 216-224.DOI: 10.3778/j.issn.1002-8331.2306-0011

• 图形图像处理 • 上一篇    下一篇

改进YOLOV5s的铁轨裂纹目标检测算法

苗新法,刘宝莲,李晓琴,侯越   

  1. 兰州交通大学 电子与信息工程学院,兰州 730070
  • 出版日期:2024-06-15 发布日期:2024-06-14

Improved YOLOV5s Railway Crack Target Detection Algorithm

MIAO Xinfa, LIU Baolian, LI Xiaoqin, HOU Yue   

  1. College of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
  • Online:2024-06-15 Published:2024-06-14

摘要: 铁轨表面的裂纹图像目标小、背景干扰信息多、分辨率高,使其目标检测效果并不理想。对此,提出一种改进YOLOV5s的目标检测算法以提升铁轨裂纹的检测效果。在颈部网络中引入VOV-GSCSP模块,用更轻量的卷积方式GSconv来替换普通的卷积,在保留更多细节信息的同时减轻网络的计算量。改进特征金字塔结构,提出多路径跨层融合结构,在特征金字塔下采样的过程中跨层融入主干网络的信息,保留更多原始的特征信息,提升目标检测的精度。同时,引入CA注意力模块和Transformer结构进一步加强高阶语义的信息提取。实验结果表明,改进的YOLOV5s算法,平均均值精度(mAP)达到62.4%,相对于原YOLOV5s算法提高了6.2个百分点;召回率(Recall)为92.2%,提升了4.4个百分点。

关键词: 目标检测, YOLOV5, GSconv, 注意力机制

Abstract: The crack image on the rail surface has a small target, many background interference information and a high resolution, making its target detection effect not ideal. In this regard, the paper proposes an improved target detection algorithm of YOLOV5s to improve the detection effect of rail cracks. The VOV-GSCSP module is introduced in the neck network, and the ordinary convolution is replaced by the lighter convolution method GSconv to reduce the computational effort of the network while retaining more detailed information. The feature pyramid structure is improved and a multi-path cross-layer fusion structure is proposed to incorporate the information of the backbone network across layers in the process of feature pyramid downsampling to retain more original feature information and improve the accuracy of target detection. Meanwhile, the CA attention module and Transformer structure are introduced to further enhance the information extraction of higher-order semantics. The experimental results show that the improved YOLOV5s algorithm achieves an average mean accuracy (mAP) of 62.4%, which is 6.2 percentage points better than the original YOLOV5s algorithm; the recall rate (Recall) is 92.2%, which is 4.4 percentage points better.

Key words: target detection, YOLOV5, GSconv, attention mechanism