Improved YOLOv5 Helmet Wearing Detection Algorithm for Small Targets

doi:10.3778/j.issn.1002-8331.2305-0209

Abstract

Abstract: Safety helmets are the safety guarantee for construction personnel, but existing safety helmet detection models have issues such as false detection and missed detection of overlapping and dense small targets in complex environments. Therefore, an improved small target detection algorithm of YOLOv5 is proposed. Transformer is added to the backbone network of YOLOv5 to capture global information at multiple scales and obtain richer high-level semantic features. This paper uses GsConv convolution for feature fusion enhancement and introduces coordinate attention mechanism to enable the network to pay attention on a larger area. The detection head decouples classification and regression to accelerate convergence speed. Anchor-free detection method is used to simplify algorithm structure and accelerate detection speed. The EIOU loss function is used to optimize the accuracy of frame prediction. The experimental results on the self-made helmet dataset show that the improved YOLOv5 model has an average accuracy of 96.33%, which is 4.73?percentage points higher than the YOLOv5 model, meeting the requirements for detecting overlapping and dense small targets under complex conditions.

Key words: safety helmet detection, improved YOLOv5, Transformer, decoupling head, anchor-free

摘要： 安全帽是施工人员的安全保障，但是现有安全帽检测模型在复杂环境下对重叠和密集小目标存在误检和漏检等问题，为此提出改进YOLOv5的小目标检测算法。在YOLOv5的主干网络中加入Transformer捕获多个尺度上的全局信息，获得更丰富的高层语义特征；使用GsConv卷积进行特征融合增强，并引入坐标注意力机制（coordinate attention），让网络在更大区域上进行注意；检测头将分类和回归进行解耦，加快收敛速度；使用无锚点（anchor-free）的检测方法，简化算法结构，加快检测速度；使用EIOU损失函数来优化边框预测的准确度。在自制安全帽数据集上实验结果表明，改进的YOLOv5模型平均精度达到了96.33%，相比于YOLOv5模型，平均精度提高了4.73个百分点，达到了在复杂条件下对重叠和密集小目标检测的要求。

关键词: 安全帽检测, 改进YOLOv5, Transformer, 解耦头, 无锚点（anchor-free）

DENG Zhenrong, XIONG Yuxu, YANG Rui, CHEN Yuren. Improved YOLOv5 Helmet Wearing Detection Algorithm for Small Targets[J]. Computer Engineering and Applications, 2024, 60(3): 78-87.

邓珍荣, 熊宇旭, 杨睿, 陈昱任. 面向小目标的改进YOLOv5安全帽佩戴检测算法[J]. 计算机工程与应用, 2024, 60(3): 78-87.

References

[1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[2] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[3] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems, 2015: 91-99.
[4] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397.
[5] HE K M, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[6] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[7] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[8] REDMON J, FARHADI A. Yolov3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[9] WANG C, BOCHKOVSKIY A, LIAO H. Scaled-YOLOv4: scaling cross stage partial network[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13024-13033.
[10] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[11] 徐守坤, 王雅如, 顾玉宛, 等. 基于改进Faster RCNN的安全帽佩戴检测研究[J]. 计算机应用研究, 2020, 37(3): 901-905.
XU S K, WANG Y R, GU Y W, et al. Safey helmet wearing detection study based on improved Faster RCNN[J]. Application Research of Computers, 2020, 37(3): 901-905.
[12] 王玲敏, 段军, 辛立伟. 引入注意力机制的YOLOv5 安全帽佩戴检测方法[J]. 计算机工程与应用, 2022, 58(9): 303-312.
WANG L M, DUAN J, XIN L W, YOLOv5 helmet wear detection method with introduction of attention mechanism[J]. Computer Engineering and Applications, 2022, 58(9): 303-312.
[13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st Conference on Neural Information Processing Systems. Washington DC, USA: IEEE Press, 2017: 5998-6010.
[14] 刘文婷, 卢新明. 基于计算机视觉的 Transformer 研究进展[J]. 计算机工程与应用, 2022, 58(6): 1-16.
LIU W T, LU X M. Research progress of Transformer based on computer vision[J]. Computer Engineering and Applications, 2022, 58(6): 1-16.
[15] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[C]//Proceedings of International Conference on Learning Representations. Washington DC, USA:[s.n.], 2020: 1-9.
[16] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 213-229.
[17] 江英杰, 宋晓宁. 基于视觉 Transformer 的双流目标跟踪算法[J]. 计算机工程与应用, 2022, 58(12): 183-190.
JIANG Y J, SONG X N. Dual-stream cbject tracking algorithm based on visual Transformer[J]. Computer Engineering and Applications, 2022, 58(12): 183-190.
[18] SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transformers for visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 16519-16529.
[19] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[20] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 11(4): 541-551.
[21] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington DC, USA: IEEE Press, 2016: 770-778.
[22] DAI Z, LIU H, LE Q V, et al. Coatnet: marrying convolution and attention for all data sizes[C]//Advances in Neural Information Processing Systems, 2021: 3965-3977.
[23] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13713-13722.
[24] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248-255.