High-Precision Fall Detection Algorithm with Improved YOLOv5

doi:10.3778/j.issn.1002-8331.2307-0190

Abstract

Abstract: In order to counter the limitations of the original YOLOv5 human fall detection task, a highly accurate fall detection algorithm, called C2D-YOLO, is proposed in this paper. The original task struggles to effectively handle complex detail capture, deformation handling, target adaptation to different scales, and occlusion detection. To overcome these challenges, several improvements are made to the YOLOv5 model. Firstly, a new feature extraction module called C2D is introduced, which improves feature characterisation, captures complex details, and handles deformations by combining deformable convolution, standard convolution, and channel-space hybrid attention mechanisms. Secondly, in the neck network, Swin Transformer block replaces the bottleneck layer of the C3 module to retain more feature information, thereby improving target detection accuracy at different scales and enhancing performance under occlusion. Finally, the head module of YOLOv5 is enhanced based on the decoupled structure of YOLOX borrowed from YOLOv5 to optimise classification and regression performance. Experimental results show that this method achieves a 3.2 percentage points improvement in mAP0.5 and a 6.5 percentage points improvement in mAP0.5：0.95 compared to existing YOLOv5s. These improvements significantly increase detection accuracy and reduce false alarm rates.

Key words: YOLOv5, fall detection, C2D, Swin Transformer block, decoupled structure

摘要： 针对原始YOLOv5在人体跌倒检测任务中无法有效应对复杂细节捕捉、变形处理、不同尺度目标适应和遮挡检测的困境，提出了一种基于C2D改进YOLOv5模型的新型高精度跌倒检测算法C2D-YOLO。给出了一种名为C2D的新型特征提取模块，通过融合可变形卷积、标准卷积和通道空间混合注意机制，将其添加到主干网络中，旨在增强特征表征能力，更好地捕捉复杂细节和处理变形。在颈部网络中，采用了Swin Transformer block替代C3模块的瓶颈层，旨在最大限度地保留特征信息，以提升对不同尺度目标的检测精度并改善遮挡情况下的性能。在借鉴YOLOX解耦结构的基础上对YOLOv5的Head模块进行改进，旨在优化分类和回归性能。实验结果表明，相比现有的YOLOv5s，该方法的mAP0.5和mAP0.5：0.95分别提高了3.2个百分点和6.5个百分点，明显提升了检测精度，减少了误检率。

关键词: YOLOv5, 跌倒检测, C2D, Swin Transformer block, 解耦结构

ZHU Shenghao, QIAN Chengshan, KAN Xi. High-Precision Fall Detection Algorithm with Improved YOLOv5[J]. Computer Engineering and Applications, 2024, 60(11): 105-114.

朱胜豪, 钱承山, 阚希. 改进YOLOv5的高精度跌倒检测算法[J]. 计算机工程与应用, 2024, 60(11): 105-114.

References

[1] CHEN Y, LIU Z, HUANG Y. The aging trend of Chinese population and the prediction of aging population in 2015-2050[J]. Chinese Journal of Social Medicine, 2018, 35(5): 480-483.
[2] CHEN Y, DU R, LUO K, et al. Fall detection system based on real-time pose estimation and SVM[C]//Proceedings of the 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering, 2021: 990-993.
[3] WANG X, ELLUL J, AZZOPARDI G. Elderly fall detection systems: a literature survey[J]. Frontiers in Robotics and AI, 2020, 7: 1-23.
[4] REN L, PENG Y. Research of fall detection and fall prevention technologies: a systematic review[J]. IEEE Access, 2019, 7: 77702-77722.
[5] 赵珍珍, 董彦如, 曹慧, 等. 老年人跌倒检测算法的研究现状[J]. 计算机工程与应用, 2022, 58(5): 50-65.
ZHAO Z Z, DONG Y R, CAO H, et al. Research status of elderly fall detecion algorithms[J]. Computer Engineering and Applications, 2022, 58(5): 50-65.
[6] 忽丽莎, 王素贞, 陈益强, 等. 基于可穿戴设备的跌倒检测算法综述[J]. 浙江大学学报 (工学版), 2018, 52(9): 1717-1728.
HU L S, WANG S Z, CHEN Y Q, et al. Fall detection algorithms based on wearable device: a review[J]. Journal of Zhejiang University (Engineering Science), 2018, 52(9): 1717-1728.
[7] ER P V, TAN K K. Wearable solution for robust fall detection[M]//Assistive technology for the elderly. [S.l.]: Academic Press, 2020: 81-105.
[8] BHATTACHARYA A, VAUGHAN R. Deep learning radar design for breathing and fall detection[J]. IEEE Sensors Journal, 2020, 20(9): 5072-5085.
[9] MA L, LIU M, WANG N, et al. Room-level fall detection based on ultra-wideband (UWB) monostatic radar and convolutional long short-term memory (LSTM)[J]. Sensors, 2020, 20(4): 1105.
[10] INTURI A R, MANIKANDAN V M, GARRAPALLY V. A novel vision-based fall detection scheme using keypoints of human skeleton with long short-term memory network[J]. Arabian Journal for Science and Engineering, 2023, 48(2): 1143-1155.
[11] DELGADO-ESCANO R, CASTRO F M, COZAR J R, et al. A cross-dataset deep learning-based classifier for people fall detection and identification[J]. Computer Methods and Programs in Biomedicine, 2020, 184: 105265.
[12] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[13] DAI J, LI Y, HE K, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016: 379-387.
[14] HE K, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[15] CAI Z, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[16] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[17] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[18] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[19] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[20] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21-37.
[21] DIWAN T, ANIRUDH G, TEMBHURNE J V. Object detection using YOLO: challenges, architectural successors, datasets and applications[J]. Multimedia Tools and Applications, 2023, 82(6): 9243-9275.
[22] MEI X, ZHOU X, XU F, et al. Human intrusion detection in static hazardous areas at construction sites: deep learning-based method[J]. Journal of Construction Engineering and Management, 2023, 149(1): 04022142.
[23] CHEN T, DING Z, LI B. Elderly fall detection based on improved YOLOv5s network[J]. IEEE Access, 2022, 10: 91273-91282.
[24] DING X, GUO Y, DING G, et al. ACNet: strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1911-1920.
[25] LI S, LI K, QIAO Y, et al. A multi-scale cucumber disease detection method in natural scenes based on YOLOv5[J]. Computers and Electronics in Agriculture, 2022, 202: 107363.
[26] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13713-13722.
[27] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[28] LI Y, MA R, ZHANG R, et al. A tea buds counting method based on YOLOv5 and Kalman filter tracking algorithm[J]. Plant Phenomics, 2023, 5: 0030.
[29] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[30] 陈彦蓉, 高刃, 吴文欢, 等. 改进YOLOv5的新能源电池集流盘缺陷检测方法[J]. 电子测量与仪器学报, 2023, 37(5): 58-67.
CHEN Y R, GAO R, WU W H, et al. Defect detection method for new energy battery collector disc based on improved YOLOv5 network[J]. Journal of Electronic Measurement and Instrumentation, 2023, 37(5): 58-67.
[31] DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 764-773.
[32] 胡皓, 郭放, 刘钊. 改进YOLOX-S模型的施工场景目标检测[J]. 计算机科学与探索, 2023, 17(5): 1089-1101.
HU H, GUO F, LIU Z. Object detection based on improved YOLOX-S model in construction sites[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1089-1101.
[33] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[34] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 390-391.
[35] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[36] HU J, LIU B, PENG S. Forecasting salinity time series using RF and ELM approaches coupled with decomposition techniques[J]. Stochastic Environmental Research and Risk Assessment, 2019, 33: 1117-1135.
[37] ZHU X, HU H, LIN S, et al. Deformable convnets v2: more deformable, better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9308-9316.
[38] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision, 2018: 3-19.
[39] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning, 2015: 448-456.
[40] ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. Neural Networks, 2018, 107: 3-11.
[41] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[42] SONG G, LIU Y, WANG X. Revisiting the sibling head in object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11563-11572.
[43] GE Z, LIU S, WANG F, et al. YOLOx: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[44] 武历展, 王夏黎, 张倩, 等. 基于优化YOLOv5s的跌倒人物目标检测方法[J]. 图学学报, 2022, 43(5): 791-802.
WU L Z, WANG X L, ZHANG Q, et al. An object detection method of falling person based on optimized YOLOv5s[J]. Journal of Graphics, 2022, 43(5): 791-802.