Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (5): 165-176.DOI: 10.3778/j.issn.1002-8331.2310-0292

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Fusion of Spatio-Temporal Domain Knowledge and Data-Driven for Skeleton-Based Action Recognition

LIANG Chengwu , HU Wei, YANG Jie, JIANG Songqi, HOU Ning   

  1. 1.College of Electrical Engineering and New Energy, China Three Gorges University, Yichang, Hubei 443002, China
    2.School of Electrical and Control Engineering, Henan University of Urban Construction, Pingdingshan, Henan 467036, China
  • Online:2025-03-01 Published:2025-03-01

融合时空领域知识与数据驱动的骨架行为识别

梁成武,胡伟,杨杰,蒋松琪,侯宁   

  1. 1.三峡大学 电气与新能源学院, 湖北 宜昌 443002
    2.河南城建学院 电气与控制工程学院, 河南 平顶山 467036

Abstract: Action recognition based on skeleton data has gradually attracted the attention of researchers due to its data compactness and resistance to background interference. Existing data-driven methods for fusing spatio-temporal domain knowledge of skeleton actions have not been fully investigated. Based on this, this paper proposes a skeleton action recognition method that fuses spatio-temporal domain priori knowledge of human actions with an improved CNN network structure. Firstly, a temporal channel focusing module is proposed based on key spatio-temporal feature domain knowledge, which guides the model to focus on discriminative feature expression by generating an aggregation coefficient matrix. Then, a multi-scale convolutional fusion module is proposed by integrating the long spatio-temporal span domain knowledge, and the temporal sense field of convolution is flexibly expanded by using grouped residual connection along the channel, so that the long spatio-temporal span feature expression capability can be obtained without introducing a large number of parameters. The method in this paper is evaluated and validated on three large datasets, NTU RGB+D, NTU RGB+D 120 and FineGYM, and achieves recognition accuracies of 96.6%, 89.6% and 94.1%, respectively. The results show that the fusion of spatio-temporal domain knowledge and data-driven can fully explore the spatio-temporal features of skeleton action, and can improve the performance of skeleton action recognition with cross-dataset generalizability.

Key words: spatio-temporal domain knowledge, data-driven, skeleton-based action recognition, convolutional neural networks, long-time modeling

摘要: 基于骨架数据的行为识别由于其数据紧凑性和抗背景干扰性,逐渐引起研究者的关注。现有数据驱动方法对融合骨架行为的时空领域知识尚未充分研究。基于此, 提出一种融合人体行为时空领域先验知识与CNN改进网络结构的骨架行为识别方法。基于关键时空特征领域知识提出时通聚焦模块,通过产生聚集系数矩阵引导模型关注鉴别性特征表达。融合长时空跨度领域知识提出多尺度卷积融合模块, 沿通道采用分组残差连接方式灵活扩大卷积的时间感受野,在不引入大量参数情况下可获得长时空跨度特征表达能力。该方法在NTU RGB+D、NTU RGB+D 120及FineGYM三个大型数据集上进行性能评估与验证, 分别取得96.6%、89.6%、94.1%的识别准确率。实验结果表明,融合时空领域知识与数据驱动可充分挖掘骨架行为时空特征,能够提升骨架行为识别性能并具有跨数据集泛化性。

关键词: 时空领域知识, 数据驱动, 骨架行为识别, 卷积神经网络, 长时空建模