计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (4): 224-234.DOI: 10.3778/j.issn.1002-8331.2204-0432

• 图形图像处理 • 上一篇    下一篇

复杂环境下基于改进YOLOv5的手势识别方法

闫颢月,王伟,田泽   

  1. 1.西安工程大学 计算机科学学院,西安 710048
    2.集成电路与微系统设计航空科技重点实验室,西安 710068
  • 出版日期:2023-02-15 发布日期:2023-02-15

Improved YOLOv5 Gesture Recognition Method in Complex Environments

YAN Haoyue, WANG Wei, TIAN Ze   

  1. 1.School of Computer Science, Xi’an Engineering University, Xi’an 710048, China
    2.Key Laboratory of Aviation Science and Technology on Integrated Circuit and Micro-System Design, Xi’an 710068, China
  • Online:2023-02-15 Published:2023-02-15

摘要: 针对目前复杂环境下因光照不均匀、背景近肤色以及手势尺度较小等原因导致的手势检测算法识别率低的问题,提出了一种手势识别方法HD-YOLOv5s。首先采用基于Retinex理论的自适应Gamma图像增强预处理方法降低光照变化对手势识别效果的影响;其次构建具有自适应卷积注意力机制SKNet的特征提取网络,提高网络的特征提取能力,减少复杂环境中的背景干扰问题;最后在特征融合网络中构建新型的双向特征金字塔结构,充分利用低层级特征以降低浅层语义信息的丢失,提高小尺度手势的检测精度,同时采用跨层级联的方式,进一步提高模型的检测效率。为了验证改进方法的有效性,分别在具有丰富光照强度对比的自制数据集和具有复杂背景的公共数据集NUS-II上进行实验,识别率达到了99.5%和98.9%,单帧照片的检测时间仅需0.01~0.02 s。

关键词: 手势识别, YOLOv5, 目标检测, 注意力机制, 双向特征金字塔

Abstract: A gesture recognition method, named HD-YOLOv5s, is proposed, facing the problem of low recognition rates of gesture detection algorithms in complex environments due to uneven lighting, near-skin color backgrounds and small gesture scales. Firstly, an adaptive Gamma image enhancement pre-processing method based on Retinex theory is used to reduce the effect of illumination changes on gesture recognition. Secondly, a feature extraction network with adaptive convolutional attention mechanism (SKNet) is constructed to improve the feature extraction capability of the network and reduce the problem of background interference in complex environments. Finally, a novel bi-directional feature pyramid network is constructed in the feature fusion network to make full use of low-level features to reduce the loss of shallow semantic information and improve the detection accuracy of small-scale gestures, while cross-level cascading is used to further improve the detection efficiency of the model. The effectiveness of the improved method is verified on a homemade dataset with rich light intensity contrast and a public dataset NUS-II with complex backgrounds, the recognition rates are 99.5% and 98.9% respectively, and the detection time for a single frame is only 0.01 s to 0.02 s.

Key words: gesture recognition, YOLOv5, object detection, attention mechanism, bi-directional feature pyramid