Research on Improved YOLOv5s Sign Language Recognition Algorithm

doi:10.3778/j.issn.1002-8331.2202-0210

Abstract

Abstract: In order to solve the problem of difficult communication between healthy people and hearing impairment, a sign language recognition network based on improved YOLOv5s is proposed. Firstly, the [K]-means++ algorithm is used to improve the size matching degree of a priori anchor frame, determine the most preferred size of a priori anchor frame, and realize the accurate matching between a priori anchor frame and actual object. Secondly, the channel domain of CBAM（convolution block attention module） attention mechanism is improved to solve the problem of lack of channel information caused by dimensionality reduction, and the improved CBAM is added to the backbone network of YOLOv5s to make the model more accurately locate and identify key targets. Finally, Cross Entropy Loss and Lovasz-Softmax Loss weighting are combined to make the network more stable in the process of model training and improve the accuracy. The experimental results show that compared with the original YOLOv5s model, the mean average precision（mAP）, precision and recall of the improved network model are improved by 3.44 percentage points, 3.17 percentage points and 1.89 percentage points respectively, which effectively improves the detection accuracy of sign language recognition network.

Key words: sign language recognition, YOLOv5, [K]-means++, attention mechanism, loss function

摘要： 为解决健全人士与听障人士交互信息困难的问题，提出一种改进YOLOv5s网络模型的手语识别网络。应用[K]-means++算法提高先验锚框的尺寸匹配度，确定了最优先验锚框尺寸，实现先验锚框与实际物体的精确匹配；改进CBAM（convolution block attention module）注意力机制的通道域，解决其因降维而造成的通道信息缺失问题，并将改进后的CBAM加入到YOLOv5s的骨干网络中，使模型更加精准地定位和识别到关键的目标。将Cross Entropy Loss和Lovasz-Softmax Loss加权结合使用，使得网络在模型训练过程中更加稳定地收敛，在精准率上也得到了一定的提升。实验结果表明，与原本的YOLOv5s模型相比，改进后网络模型的平均精度均值（mean average precision，mAP）、精准率和召回率分别提升了3.44个百分点、3.17个百分点、1.89个百分点，有效地提高了手语识别网络的检测精确度。

关键词: 手语识别, YOLOv5, [K]-means++, 注意力机制, 损失函数

XING Jinchao, PAN Guangzhen. Research on Improved YOLOv5s Sign Language Recognition Algorithm[J]. Computer Engineering and Applications, 2022, 58(16): 194-203.

邢晋超, 潘广贞. 改进YOLOv5s的手语识别算法研究[J]. 计算机工程与应用, 2022, 58(16): 194-203.

References

[1] 米娜瓦尔·阿不拉，阿里甫·库尔班，解启娜，等.手语识别方法与技术综述[J].计算机工程与应用，2021，57（18）：1-12.
MINAWAER A，ALIFU K，XIE Q N，et al.Review of sign language recognition methods and techniques[J].Computer Engineering and Applications，2021，57（18）：1-12.
[2] WEN F，ZHANG Z X，HE T Y，et al.AI enabled sign language recognition and VR space bidirectional communication using triboelectric smart glove[J].Nature Communications，2021，12（1）：5378.
[3] LI K H，ZHOU Z Y，LEE C H.Sign transition modeling and a scalable solution to continuous sign language recognition for real-world applications[J].ACM Transactions on Accessible Computing，2016，8（2）：1-23.
[4] AHMED M A，ZAIDAN B B，ZAIDAN A A，et al.Based on wearable sensory device in 3D-printed humanoid：a new real-time sign language recognition system[J].Measurement，2021，168：108431.
[5] BOUKDIR A，BENADDY M，ELLAHYANI A，et al.Isolated video-based Arabic sign language recognition using convolutional and recursive neural networks[J].Arabian Journal for Science and Engineering，2022，47：2187-2199.
[6] GUO D，ZHOU W G，WANG M，et al.Hierarchical LSTM for sign language translation[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence，the 30th Innovative Applications of Artificial Intelligence Conference and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence，2018：6845-6852.
[7] CAMG?Z N C，KOLLER O，HADFIELD S，et al.Sign language transformers：joint end-to-end sign language recognition and translation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：10020-10030.
[8] ZHANG S J，ZHANG Q.Sign language recognition based on global-local attention[J].Journal of Visual Communication and Image Representation，2021，80（7）：103280.
[9] 于娟，罗舜.基于YOLOv5的违章建筑检测方法[J].计算机工程与应用，2021，57（20）：236-244.
YU J，LUO S.Detection method of illegal building based on YOLOv5[J].Computer Engineering and Applications，2021，57（20）：236-244.
[10] BOCHKOVSKIY A，WANG C Y，LIAO H Y M.YOLOv4：optimal speed and accuracy of object detection[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition，2020.
[11] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//Proceedings of the 14th European Conference Computer Vision，2016：21-37.
[12] LIN T Y，DOLLáR P，GIRSHICK R B，et al.Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：936-944.
[13] MEI Y Q，FAN Y C，ZHANG Y L，et al.Pyramid attention networks for image restoration[J].arXiv：2004.13824，2020.
[14] LIU S T，DI H，WANG Y H.Learning spatial fusion for single-shot object detection[J].arXiv：1911.09516，2019.
[15] TAN M，PANG R，LE Q V.EfficientDet：scalable and efficient object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：10778-10787.
[16] 程婧怡，段先华，朱伟.改进YOLOv3的金属表面缺陷检测研究[J].计算机工程与应用，2021，57（19）：252-258.
CHENG J Y，DUAN X H，ZHU W.Research on metal surface defect detection by improved YOLOv3[J].Computer Engineering and Applications，2021，57（19）：252-258.
[17] WANG P，HUANG H，WANG M，et al.YOLOv5s-FCG：an improved YOLOv5 method for inspecting riders’ helmet wearing[J].Journal of Physics：Conference Series，2021，2024：012059.
[18] WOO S，PARK J，LEE J Y，et al.CBAM：convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision，Munich，2018：3-19.
[19] WANG Q，WU B，ZHU P，et al.ECA-Net：efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 Conference on Computer Vision and Pattern Recognition，Seattle，2020.
[20] BERMAN M，TRIKI A R，BLASCHKO M B.The Lovasz-Softmax Loss：a tractable surrogate for the optimization of the intersection-over-union measure in neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：4413-4421.
[21] 郭晓静，隋昊达.改进YOLOv3在机场跑道异物目标检测中的应用[J].计算机工程与应用，2021，57（8）：249-255.
GUO X J，SUI H D.Application of improved YOLOv3 in foreign object debris target detection on airfield pavement[J].Computer Engineering and Applications，2021，57（8）：249-255.
[22] REDMON J，FARHADI A.YOLOv3：an incremental improvement[J].arXiv：1804.02767，2018.
[23] REN S，HE K，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[24] 马立，巩笑天，欧阳航空.Tiny YOLOV3目标检测改进[J].光学精密工程，2020，28（4）：988-995.
MA L，GONG X T，OUYANG H K.Improvement of Tiny YOLOV3 target detection[J].Optics and Precision Engineering，2020，28（4）：988-995.