计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (21): 132-140.DOI: 10.3778/j.issn.1002-8331.2207-0198

• 模式识别与人工智能 • 上一篇    下一篇

融合内外依赖的人体骨架动作识别模型

毛国君,王一锦   

  1. 1.福建工程学院 计算机科学与数学学院,福州 350118
    2.福建工程学院 福建省大数据挖掘与应用重点实验室,福州 350118
  • 出版日期:2023-11-01 发布日期:2023-11-01

Human Skeleton Action Recognition Model Integrated Internal and External Dependences

MAO Guojun, WANG Yijin   

  1. 1.School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China
    2.Key Laboratory of Big Data Mining and Application in Fujian Province, Fujian University of Technology, Fuzhou 350118, China
  • Online:2023-11-01 Published:2023-11-01

摘要: 基于动态骨架图的人体动作识别是计算机视觉领域中的一个研究热点。传统识别方法大多是建立在人体骨架的局部自然物理连接上(内在依赖)。然而,许多隐含的非局部的关节连接有时对于人体动作识别是不可忽略的,如手脚的互动等。引入外在依赖概念来表示这种隐式的非物理连接,并通过内、外依赖机制来处理骨架图,完成内、外依赖的空间图卷积融合。通过设计合适的时间卷积模块,进一步构建融合内外依赖的的时空图卷积网络(IED-STGCN)。实验表明,IED-STGCN在Kinetics数据集上的识别精度比现有的时空图卷积网络(ST-GCN)提升了2.5个百分点,在X-Sub和X-View两个数据集上的识别精度分别比现有的ST-GCN模型提升了3.4和3.8个百分点。该研究的主要技术有时间卷积(TC)、内在依赖图卷积(IGC)以及外在依赖图卷积(EGC)等,通过消融实验说明了这些技术的有效性。

关键词: 动作识别, 时空图卷积网络, 内在依赖, 外在依赖

Abstract: Human action recognition based on the dynamic skeleton graph is a hot research field in computer vision. Traditional recognition methods are mostly based on the local natural physical connection  in the human skeleton(internal dependence). However, many implicit non-local joint connections such as interaction between hands and feet is also important for human motion recognition. In this paper, the concept of external dependence is proposed to represent this implicit non-physical connection, a skeleton graph can be processed through the mechanism of internal dependence and external dependence, and the spatial graph convolutional calculation can be completed by integrating internal and external dependences. Further, by adding an appropriate time convolution module, a spatial temporal graph convolutional networks integrated internal and external dependencies(IED-STGCN) is designed in this paper. Experiments show that IED-STGCN has higher recognition accuracies than ones of the existing spatial temporal graph convolutional networks(ST-GCN) models. For example, on the Kinetics dataset, it is about 2.5?percentage points higher; on the X-Sub dataset, 3.4?percentage points higher; and on X-View dataset, 3.8?percentage points higher. The main techniques in this study include temporal convolution(TC), Internal dependencies graph convolution(IGC), and external dependencies convolution(EGC). In order to verify their effectiveness, related technical ablation experiments are completed.

Key words: action recognition, spatial temporal graph convolution networks, internal dependence, external dependence