Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (14): 169-175.DOI: 10.3778/j.issn.1002-8331.2004-0304

Previous Articles     Next Articles

Manipulation Action Recognition Based on Gesture Feature Fusion

ZHOU Xiaojing, CHEN Junhong, YANG Zhenguo, LIU Wenyin   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2021-07-15 Published:2021-07-14



  1. 广东工业大学 计算机学院,广州 510006


In view of manipulation action recognition in dynamic and complex scenes, an action recognition framework based on gesture feature fusion is proposed. The framework mainly contains an RGB video feature extraction module, a gesture feature extraction module and an action classification module. The RGB video feature extraction module mainly uses the I3D network to extract the temporal and spatial features of the RGB videos; the gesture feature extraction module uses the Mask R-CNN network to extract the operator’s gesture features; the action classification module merges the above features and inputs them into a classifier for classification. On the EPIC-Kitchens dataset, the accuracy of the proposed method for grasp gestures recognition is 89.63%, and the accuracy of recognizing comprehensive actions reaches 74.67%.

Key words: gesture feature, manipulation action, video feature extraction, action recognition


针对动态复杂场景下的操作动作识别,提出一种基于手势特征融合的动作识别框架,该框架主要包含RGB视频特征提取模块、手势特征提取模块与动作分类模块。其中RGB视频特征提取模块主要使用I3D网络提取RGB视频的时间和空间特征;手势特征提取模块利用Mask R-CNN网络提取操作者手势特征;动作分类模块融合上述特征,并输入到分类器中进行分类。在EPIC-Kitchens数据集上,提出的方法识别抓取手势的准确性高达89.63%,识别综合动作的准确度达到了74.67%。

关键词: 手势特征, 操作动作, 视频特征提取, 动作识别