计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (16): 194-203.DOI: 10.3778/j.issn.1002-8331.2202-0210

• 模式识别与人工智能 • 上一篇    下一篇

改进YOLOv5s的手语识别算法研究

邢晋超,潘广贞   

  1. 中北大学 软件学院,太原 030000
  • 出版日期:2022-08-15 发布日期:2022-08-15

Research on Improved YOLOv5s Sign Language Recognition Algorithm

XING Jinchao, PAN Guangzhen   

  1. School of Software, North University of China, Taiyuan 030000, China
  • Online:2022-08-15 Published:2022-08-15

摘要: 为解决健全人士与听障人士交互信息困难的问题,提出一种改进YOLOv5s网络模型的手语识别网络。应用[K]-means++算法提高先验锚框的尺寸匹配度,确定了最优先验锚框尺寸,实现先验锚框与实际物体的精确匹配;改进CBAM(convolution block attention module)注意力机制的通道域,解决其因降维而造成的通道信息缺失问题,并将改进后的CBAM加入到YOLOv5s的骨干网络中,使模型更加精准地定位和识别到关键的目标。将Cross Entropy Loss和Lovasz-Softmax Loss加权结合使用,使得网络在模型训练过程中更加稳定地收敛,在精准率上也得到了一定的提升。实验结果表明,与原本的YOLOv5s模型相比,改进后网络模型的平均精度均值(mean average precision,mAP)、精准率和召回率分别提升了3.44个百分点、3.17个百分点、1.89个百分点,有效地提高了手语识别网络的检测精确度。

关键词: 手语识别, YOLOv5, [K]-means++, 注意力机制, 损失函数

Abstract: In order to solve the problem of difficult communication between healthy people and hearing impairment, a sign language recognition network based on improved YOLOv5s is proposed. Firstly, the [K]-means++ algorithm is used to improve the size matching degree of a priori anchor frame, determine the most preferred size of a priori anchor frame, and realize the accurate matching between a priori anchor frame and actual object. Secondly, the channel domain of CBAM(convolution block attention module) attention mechanism is improved to solve the problem of lack of channel information caused by dimensionality reduction, and the improved CBAM is added to the backbone network of YOLOv5s to make the model more accurately locate and identify key targets. Finally, Cross Entropy Loss and Lovasz-Softmax Loss weighting are combined to make the network more stable in the process of model training and improve the accuracy. The experimental results show that compared with the original YOLOv5s model, the mean average precision(mAP), precision and recall of the improved network model are improved by 3.44 percentage points, 3.17 percentage points and 1.89 percentage points respectively, which effectively improves the detection accuracy of sign language recognition network.

Key words: sign language recognition, YOLOv5, [K]-means++, attention mechanism, loss function