Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (20): 176-183.DOI: 10.3778/j.issn.1002-8331.2207-0022

• Graphics and Image Processing • Previous Articles     Next Articles

Dual-Network Fusion 3D Gesture Interaction Algorithm Based on Deep Residual Network

XUE Jiawei, KONG Weiwei, WANG Ze   

  1. 1.Xi’an University of Posts and Telecommunications, Xi’an 710121, China
    2.Shaanxi Provincial Key Laboratory of Network Data Analysis and Intelligent Processing, Xi’an 710121, China
    3.Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    4.Guangxi Key Laboratory of Trusted Software, Guilin, Guangxi 541004, China
  • Online:2023-10-15 Published:2023-10-15

融合深度残差网络的双网3D手势交互算法

薛佳伟,孔韦韦,王泽   

  1. 1.西安邮电大学,西安 710121
    2.陕西省网络数据分析与智能处理重点实验室,西安 710121
    3.桂林电子科技大学,广西 桂林 541004
    4.广西可信软件重点实验室,广西 桂林 541004

Abstract: Aiming at a series of problems existing in the existing 3D gesture interaction methods, such as low accuracy and poor reduction, this paper proposes a new 3D gesture pose estimation method based on monocular camera technology. Its peak performance can reach 85~100 FPS under the condition of ensuring the recognition accuracy. This method is based on the dual network fusion architecture. The precise design of the architecture makes it possible to use all available hand training data sources. On this basis, the post correction network module based on inverse kinematics design is added to correct hand details. This output method can be more directly applied to visual design and graphic interactive output. After many experiments and comparisons, compared with the traditional algorithm, the architecture design of this paper improves the average recognition accuracy by 1%~2% while ensuring the recognition speed. At the same time, the modeling effect based on MANO is more refined and realistic.

Key words: gesture interaction, residual network, inverse kinematics, dual network integration, 3D modeling

摘要: 针对现有3D手势交互方法存在的精度、成本、泛用性三角抉择难题,受限于发展程度现有方法大多只能三者取其二,传统识别方法通常优先考虑成本和泛用性从而在精度上有所缺失,基于数据手套的3D手势交互保证了精度和泛用性的情况下无法解决过高的成本问题,以及针对特定领域开发(例如手语教学、交通手势识别)的方法在保证精度和成本的情况下仅在当前小范围领域内适用。为了平衡这一三角难题,找到三者之间的平衡点,提出了一种新的最低基于单目摄像技术的3D手势姿态估计算法。其峰值性能在保证识别精度的情况下,最高可达85~100?FPS。该算法是基于双网络融合架构来实现,主体架构分为前置识别网络和后置纠正网络,前置网络是基于ResNet50网络设计的二维关键点映射至三维的手部关键点检测,后置网络是基于逆运动学设计的纠正网络用来纠正映射偏差带来的非常规手型,使得识别结果更加顺滑,符合真实人体结构。架构的精准设计使得其可以利用更多范围内可用的手部训练数据源,在此基础上加入的后置纠正网络模块用来纠正手部细节,这种输出方式能够使得该算法更直接地应用于视觉设计和图形交互输出。经过多次实验对比后,相较于传统算法,该架构设计在保证识别速度的情况下,在各类数据集上平均识别精度提升1%~2%,同时基于MANO的建模效果更加精致逼真。

关键词: 手势交互, 残差网络, 逆运动学, 双网融合, 3D建模