计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (14): 148-157.DOI: 10.3778/j.issn.1002-8331.2004-0078

• 模式识别与人工智能 • 上一篇    下一篇

多尺度高分辨率保持和视角不变的手姿态估计

熊杰,彭军,杨文姬,黄丽芳   

  1. 1.江西农业大学 计算机与信息工程学院,南昌 330045
    2.江西农业大学 软件学院,南昌 330045
    3.浙江大学 CAD&CG国家重点实验室,杭州 310058
    4.江铃控股有限公司,南昌 330052
  • 出版日期:2021-07-15 发布日期:2021-07-14

Multi-scale High-Resolution Preserving and Perspective-Invariant Hand Pose Estimation

XIONG Jie, PENG Jun, YANG Wenji, HUANG Lifang   

  1. 1.School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang 330045, China
    2.School of Software, Jiangxi Agricultural University, Nanchang 330045, China
    3.State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou 310058, China
    4.Jiangling Holdings Limited, Nanchang 330052, China
  • Online:2021-07-15 Published:2021-07-14

摘要:

目前基于彩色图像的手姿态2D关键点热图估计大多数采用卷积姿势机或沙漏网络进行,但这两种网络不能同时满足高分辨率表示保持学习和多尺度特征融合。针对该问题引用了一种多尺度高分辨率保持的网络,该网络采用高低分辨率表示并行设计的结构,并通过融合所有分辨率表示增强各分辨率表示的特征,而且拥有多个阶段提取高质量特征用于2D热图估计。为得到3D手姿态,还使用了全局旋转视角不变的方法将2D热图映射到3D姿态。在三个公开数据集(RHD、STB、Dexter+Object)上分别对2D手姿态估计和3D手姿态估计进行了实验,结果验证了该方法在手姿态估计中的有效性。

关键词: 手姿态估计, 高分辨率表示, 多尺度融合, 视角不变, 深度学习

Abstract:

At present, most of the networks used for 2D keypoint heatmaps estimation of hand pose use the convolutional pose machines or Hourglass network, but these two networks cannot simultaneously satisfy the requirements of high-resolution representation preserving learning and multi-scale feature fusion. In response to this problem, a multi-scale high-resolution preserving network is used, which adopts the structure of high-resolution and low-resolution representation in parallel design, and enhances the features of each resolution through the fusion of all resolution representations, and has multiple stages to extract high quality features for 2D heatmaps estimation. In order to obtain the 3D hand pose, a global rotation perspective-invariant method is also used to map the 2D heatmaps to the 3D pose. Experiments on 2D hand pose estimation and 3D hand pose estimation are conducted on three public datasets(RHD, STB, Dexter+Object), and the results verify the effectiveness of the method in hand pose estimation.

Key words: hand pose estimation, high-resolution representation, multi-scale fusion, perspective-invariant, deep learning