Research on Gesture Recognition Based on Improved YOLOv5 and Mediapipe

doi:10.3778/j.issn.1002-8331.2308-0097

Abstract

Abstract: The existing gesture recognition algorithms have the problems of large amounts of calculation and poor robustness. In this paper, a gesture recognition method based on IYOLOv5-Med (improved YOLOv5 Mediapipe) algorithm is proposed. This algorithm combines the improved YOLOv5 algorithm with the Mediapipe method, including gesture detection and gesture analysis. In the part of gesture detection, the traditional YOLOv5 algorithm is improved. Firstly, the C3 module is reconstructed by FastNet. Secondly, the CBS module is replaced by the GhostConv module in GhostNet. Thirdly, the SE attention mechanism module is introduced at the end of the Backbone network. The improved algorithm has a smaller model size and is more suitable for edge devices with limited resources. In the part of gesture analysis, a method based on Mediapipe is proposed. The key points of the hand are detected in the gesture area located in the gesture detection part, and the relevant features are extracted, and then identified by the naive Bayes classifier. The experimental findings affirm the efficacy of the IYOLOv5-Med algorithm introduced in this article. When compared to the conventional YOLOv5 algorithm, the parameters are reduced by 34.5%, the computations are reduced by 34.9%, and the model weight is decreased by 33.2%. The final average recognition rate reaches 0.997, and the implementation method is relatively simple, which has a good application prospect.

Key words: gesture recognition, YOLOv5, Mediapipe, FastNet, attention mechanism

摘要： 针对现有手势识别算法计算量大、鲁棒性差等问题，提出一种基于IYOLOv5-Med（improved YOLOv5 Mediapipe）算法的手势识别方法。该算法将改进的YOLOv5算法和Mediapipe方法结合，包括手势检测和手势分析两部分，算法有效降低了训练的时间成本，增加了识别的鲁棒性。手势检测部分，改进了传统YOLOv5算法，利用FastNet重构C3模块，将CBS模块替换为GhostNet中GhostConv模块，在Backbone网络末端加入SE注意力机制模块，改进后的算法，模型体积更小，更适用于资源有限的边缘设备。手势分析部分，提出了一种基于Mediapipe的方法，对手势检测部分定位到的手势区域进行手部关键点检测，并提取相关特征，然后通过朴素贝叶斯分类器进行识别。实验结果证实了提出的IYOLOv5-Med算法的有效性，与传统YOLOv5算法相比，参数量下降34.5%，计算量减少34.9%，模型权重降低33.2%，最终平均识别率达到0.997，且实现方法相对简单，有较好的应用前景。

关键词: 手势识别, YOLOv5, Mediapipe, FastNet, 注意力机制

NI Guangxing, XU Hua, WANG Chao. Research on Gesture Recognition Based on Improved YOLOv5 and Mediapipe[J]. Computer Engineering and Applications, 2024, 60(7): 108-118.

倪广兴, 徐华, 王超. 融合改进YOLOv5及Mediapipe的手势识别研究[J]. 计算机工程与应用, 2024, 60(7): 108-118.

References

[1] 谢小雨, 刘喆颉. 基于DTW算法的肌电信号手势识别方法[J]. 计算机工程与应用, 2018, 54(5): 132-137.
XIE X Y, LIU Z J. Gesture recognition method based on DTW algorithm[J]. Computer Engineering and Applications, 2018, 54(5): 132-137.
[2] 伍建军, 姚志博, 李嘉豪, 等. 基于手势传感器技术的移动机器人设计[J]. 制造业自动化, 2022, 44(9): 73-76.
WU J J, YAO Z B, LI J H, et al. Design of mobile robot based on gesture sensor technology[J]. Manufacturing Automation, 2022, 44(9): 73-76.
[3] 林清宇. 基于Kinect的手势检测与追踪研究[D]. 南京: 南京邮电大学, 2020.
LIN Q Y. Research on Kinect-based gesture detection and tracking[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2020.
[4] CHEN L, WEI H, FERRYMAN J. A survey of human motion analysis using depth imagery[J]. Pattern Recognition Letters, 2013, 34(15): 1995-2006.
[5] PRASUHN L, OYAMADA Y, MOCHIZUKI Y, et al. A HOG-based hand gesture recognition system on a mobile device[C]//2014 IEEE International Conference on Image Processing, Paris, 2014.
[6] SUTTAPAK W, AUEPHANWIRIYAKUL S, THEERA-UMPON N. Incorporating SIFT with hard C-means algorithm[C]//2010 2nd International Conference on Computer and Automation Engineering, Singapore, 2010.
[7] 尚常军, 丁瑞. 基于曲率局部二值模式的深度图像手势特征提取[J]. 计算机应用, 2016, 36(10): 2885-2889.
SHANG C J, DING R. Gesture feature extraction of depth image based on curvature local binary pattern[J]. Computer Applications, 2016, 36(10): 2885-2889.
[8] FU Y, WANG M, ZHANG C Q. SAR image target recognition based on Hu invariant moments and SVM[C]//2009 Fifth International Conference on Information Assurance and Security, Xi’an, 2009.
[9] DING Y D, PANG H B. An improved algorithm of hand-gesture recognition based on Haar-like features and Adaboost[C]//2011 World Congress on Engineering and Technology, Shanghai, 2012.
[10] ZIMMERMANN C, BROX T. Learning to estimate 3D hand pose from single RGB images[C]//2017 IEEE International Conference on Computer Vision, 2017: 4913-4921.
[11] MOON G, YU S I, WEN H. InterHand2.6M: a dataset and baseline for 3D interacting hand pose estimation from a single RGB image[C]//Computer Vision-ECCV 2020, 2020: 548-564.
[12] WANG Y, PENG C, LIU Y. Mask-pose cascaded CNN for 2D hand pose estimation from single color image[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 29(11): 3258-3268.
[13] WU W, SHI M, WU T, et al. Real-time hand gesture recognition based on deep learning in complex environments[C]//2019 Chinese Control And Decision Conference (CCDC), 2019: 5950-5955.
[14] 彭玉青, 赵晓松, 陶慧芳, 等. 复杂背景下基于深度学习的手势识别[J]. 机器人, 2019, 41(4): 534-542.
PENG Y Q, ZHAO X S, TAO H F, et al. Gesture recognition based on deep learning in complex background[J]. Robot, 2019, 41(4): 534-542.
[15] 黑振全. 基于手势识别的四旋翼控制系统研究[D]. 济南: 山东大学, 2022.
HEI Z Q. Research on four-rotor control system based on gesture recognition[D]. Jinan: Shandong University, 2022.
[16] 韩素青, 成慧雯, 王宝丽. 三支决策朴素贝叶斯增量学习算法研究[J]. 计算机工程与应用, 2020, 56(18): 42-49.
HAN S Q, CHENG H W, WANG B L. Research on three-way decision naive Bayesian incremental learning algorithm[J]. Computer Engineering and Applications, 2020, 56(18): 42-49.
[17] CHEN J, KAO S, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 12021-12031.
[18] 刘春霞, 李超, 潘理虎, 等. 改进YOLOv5s的煤矿烟火检测算法[J]. 计算机工程与应用, 2023, 59(17): 286-294.
LIU C X, LI C, PAN L H, et al. Improved YOLOv5s coal mine pyrotechnic detection algorithm[J]. Computer Engineering and Applications, 2023, 59(17): 286-294.
[19] 邹鹏, 杨凯军, 梁晨. 改进YOLOv5的轻量级不规范驾驶行为实时检测[J]. 计算机工程与应用, 2023, 59(13): 186-193.
ZOU P, YANG K J, LIANG C. Improved YOLOv5 lightweight real-time detection of irregular driving behavior[J]. Computer Engineering and Applications, 2023, 59(13): 186-193.
[20] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[21] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[22] 熊焰, 程传虎, 武建双, 等. 基于机器学习分类算法的前提选择技术研究[J]. 信息网络安全, 2021, 21(11): 9-16.
XIONG Y, CHENG C H, WU J S, et al. Research on premise selection technology based on machine learning classification algorithm[J]. Information Network Security, 2021, 21(11): 9-16.
[23] MEMO A, MINTO L, ZANUTTIGH P. Exploiting silhouette descriptors and synthetic data for hand gesture recognition[EB/OL]. [2023-07-10]. https://dx.doi.org/10.2312/stag.
20151288.
[24] MEMO A, ZANUTTIGH P. Head-mounted gesture controlled interface for human-computer interaction[J]. Multimedia Tools and Applications, 2017, 77(6): 1-13.
[25] KAPITANOV A, MAKHLYARCHUK A, KVANCHIANI K. Hagrid-hand gesture recognition image dataset[J]. arXiv:2206.08219, 2022.
[26] 庹冰, 黄丽雯, 唐鑫, 等. 基于YOLOX-WSC的PCB缺陷检测算法研究[J]. 计算机工程与应用, 2023, 59(10): 236-243.
TUO B, HUANG L W, TANG X , et al. Research on PCB defect detection algorithm based on YOLOX-WSC[J]. Computer Engineering and Applications, 2023, 59(10): 236-243.
[27] 袁磊, 唐海, 陈彦蓉, 等. 改进YOLOv5的复杂环境道路目标检测方法[J]. 计算机工程与应用, 2023, 59(16): 212-222.
YUAN L, TANG H, CHEN Y R, et al. Improved YOLOv5 road target detection method in complex environment[J]. Computer Engineering and Applications, 2023, 59(16): 212-222.
[28] 王旭, 罗铁坚, 杨林. 基于Transformer的野生动物关键点检测[J]. 传感器世界, 2021, 27(11): 19-25.
WANG X, LUO T J, YANG L. Keypoint detection of wild animals based on Transformer[J]. Sensor World, 2021, 27(11): 19-25.
[29] ZHU L, KE Z. BiFormer: vision Transformer with bi-level routing attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10323-10333.
[30] WANG C Y, BOCHKOVSKIY A, LIAN H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[31] 于飞, 徐斌, 王荣浩, 等. 基于改进YOLOv8的旋转链板检测算法[J]. 制造业自动化, 2023, 45(9): 212-216.
YU F, XU B, WANG R H, et al. Rotating chain plate detection algorithm based on improved YOLOv8[J]. Manufacturing Automation, 2023, 45(9): 212-216.