计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (7): 134-142.DOI: 10.3778/j.issn.1002-8331.2111-0171

• 模式识别与人工智能 • 上一篇    下一篇

基于多级特征融合和时域扩展的行为识别方法

吴浩原,熊辛,闵卫东,赵浩宇,汪文翔   

  1. 1.南昌大学 信息工程学院,南昌 330031
    2.南昌大学第一附属医院 信息处,南昌 330006
    3.江西省智慧城市重点实验室,南昌 330047
    4.南昌大学 软件学院,南昌 330047
  • 出版日期:2023-04-01 发布日期:2023-04-01

Action Recognition Method Based on Multi-Level Feature Fusion and Temporal Extension

WU Haoyuan, XIONG Xin, MIN Weidong, ZHAO Haoyu, WANG Wenxiang   

  1. 1.School of Information Engineering, Nanchang University, Nanchang 330031, China
    2.Information Department, First Affiliated Hospital of Nanchang University, Nanchang 330006, China
    3.Jiangxi Key Laboratory of Smart City, Nanchang 330047, China
    4.School of Software, Nanchang University, Nanchang 330047, China
  • Online:2023-04-01 Published:2023-04-01

摘要: 近年来,基于图卷积网络的行为识别是计算机视觉领域的研究热点。然而,现有的图卷积行为识别方法忽略了肢体层面的动作特征,使得行为空间特征提取不准确。此外,这些方法缺乏在间隔帧间进行时序动态建模的能力,导致行为时域特征表达不充分。针对上述问题提出一种基于多级特征融合和时域扩展的图卷积网络行为识别方法。该方法通过多级融合模块提取与融合低层次的关节特征和高层次的肢体特征,从而得到判别性更强的多层级空间特征。同时通过时域扩展模块从相邻帧、间隔帧中学习丰富的多尺度时域特征,增强行为特征的时序表达。在三个大型数据集(NTU RGB+D 60、NTU RGB+D 120和Kinetics-Skeleton)上的实验结果表明,所提方法的识别准确度高于现有行为识别方法。

关键词: 图卷积网络, 行为识别, 多级特征融合, 时域扩展

Abstract: In recent years, action recognition based on graph convolutional network(GCN) has become a research hotspot in computer vision field. However, the existing GCN-based action recognition methods ignore motion features at the limb level, which makes the extraction of spatial behavior feature inaccurate. In addition, these methods lack the ability to perform temporal dynamic modeling between interval frames, resulting in insufficient expression of temporal behavior feature. To solve the above problems, an action recognition method based on GCN with multi-level feature fusion and temporal extension is proposed. In this method, the multi-level fusion module extracts and fuses low-level joint features and high-level limb features, so as to obtain more discriminative multi-level spatial features. At the same time, the temporal extension module learns rich multi-scale temporal features from adjacent frames and interval frames, which enhances the temporal expression of behavior features. Experimental results on three large datasets(NTU RGB+D 60, NTU RGB+D 120 and Kinetics-Skeleton) show that the recognition accuracy of the proposed method is higher than that of existing action recognition methods.

Key words: graph convolutional network(GCN), action recognition, multi-level feature fusion, temporal extension