Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (4): 150-157.DOI: 10.3778/j.issn.1002-8331.2309-0467

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Skeleton Action Recognition by Integrating Intrinsic Topology and Multi-Scale Time Features

WANG Qi, HE Ning   

  1. 1.Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
    2.College of Smart City, Beijing Union University, Beijing 100101, China
  • Online:2025-02-15 Published:2025-02-14

融合内在拓扑与多尺度时间特征的骨架动作识别

王琪,何宁   

  1. 1.北京联合大学 北京市信息服务工程重点实验室,北京 100101
    2.北京联合大学 智慧城市学院,北京 100101

Abstract: Graph convolutional networks play a crucial role in skeleton based human action recognition tasks. In order to solve the problems of existing graph convolutional networks ignoring intrinsic relationships, limited time convolution function, and insufficient exploration of potential functional correlations between joints and bones, a skeleton action recognition method integrating intrinsic topology and multi-scale time features is proposed. In order to infer the intrinsic topological relationships of the context, the model utilizes multi-head self-attention mechanism and shared topology to construct an intrinsic topological space graph convolution module. A multi-scale time convolution module is constructed based on complex action sequence analysis, aiming to expand the time convolution structure and capture multi-scale time features. The model builds a bridge for the interaction of joint and bone information, achieving effective transmission and fusion of both information, in order to further explore the functional correlation between them. The proposed method is validated, on the NTU-RGB+D 60 dataset, achieving a recognition accuracy of 91.5% for CS benchmark and 96.9% for CV benchmark, on the NTU-RGB+D 120 dataset, achieving an accuracy of 89.0% for C-Sub benchmark and 90.8% for C-Set benchmark, respectively. The experimental results show that the proposed method can more effectively extract skeleton spatio-temporal features and improve recognition accuracy.

Key words: skeleton action recognition, graph convolution, intrinsic topology, multi-scale, information fusion

摘要: 图卷积网络在基于骨架的人体动作识别任务中发挥着关键作用。为了解决现有的图卷积网络忽略内在关系,时间卷积功能受限,以及未能充分探索关节与骨骼之间潜在功能相关性等问题,提出一种融合内在拓扑与多尺度时间特征的骨架动作识别方法。为推断上下文内在拓扑关系,模型利用多头自注意力机制和共享拓扑构建内在拓扑空间图卷积模块;基于复杂的动作序列分析构建多尺度时间卷积模块,旨在扩展时间卷积结构并捕捉多尺度时间特征;模型搭建关节和骨骼信息交互桥梁,实现两者信息的有效传输和融合,以便更深入地探索它们之间的功能相关性。对所提出的方法进行验证,在NTU-RGB+D 60数据集上取得了CS基准91.5%和CV基准96.9%的识别准确率,在NTU-RGB+D 120数据集上分别取得了C-Sub基准89.0%和C-Set基准90.8%的准确率。实验结果表明所提出方法能够更加有效地提取骨架时空特征,进而提升识别精度。

关键词: 骨架动作识别, 图卷积, 内在拓扑, 多尺度, 信息融合