计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (19): 166-172.DOI: 10.3778/j.issn.1002-8331.1806-0271

• 模式识别与人工智能 • 上一篇    下一篇

基于浅三维稠密网的多模态手势识别算法

邓智方,袁家政,刘宏哲,原春锋,张宏源   

  1. 1.北京联合大学 北京市信息服务工程重点实验室,北京 100101
    2.北京开放大学 科学研究处,北京 100081
    3.中国科学院 自动化研究所 模式识别重点实验室,北京 100190
  • 出版日期:2019-10-01 发布日期:2019-09-30

Multimodal Gesture Recognition Algorithm Based on Shallow 3D Dense Networks

DENG Zhifang, YUAN Jiazheng, LIU Hongzhe, YUAN Chunfeng, ZHANG Hongyuan   

  1. 1.Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
    2.Office of Academic Research, Beijing Open University, Beijing 100081, China
    3.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
  • Online:2019-10-01 Published:2019-09-30

摘要: 手势识别旨在理解人体的动态手势,是人机交互领域极其重要的交互方式之一。该方法通过将二维稠密网扩展为三维稠密网,并加入Inception结构,提出了一种基于浅三维稠密网的多模态手势识别方法,将其命名为Spatial Temporal 3D(ST3D) dense network。所提出的方法在手势识别公开数据集大规模离散手势数据集(IsoGD)上进行了评估,并取得了目前最好效果。实验证明,所提方法能够有效地学习到视频样本中手势的短期、中期以及长期时空特征。

关键词: ST3D方法, Inception结构, 多模态, 手势识别

Abstract: Gesture recognition aims at understanding dynamic gestures of the human body, and is one of the most important ways of human-computer interaction. A multimodal gesture recognition method based on a shallow 3D dense network is proposed by extending the two-dimensional dense network into a 3D dense network and adding the Inception structure, which is named Spatial Temporal 3D(ST3D) dense network. The proposed method is evaluated on the Charlearn LAP large-scale Isolated Gesture Dataset(IsoGD)and achieves the best results. Experimental results show that the proposed method can effectively learn short, mid and long term spatiotemporal features of gestures in video samples.

Key words: Spatial Temporal 3D(ST3D) dense network, Inception structure, multimodal, gesture recognition