计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (18): 180-187.DOI: 10.3778/j.issn.1002-8331.2112-0508

• 模式识别与人工智能 • 上一篇    下一篇

结合轻量Openpose和注意力引导图卷积的动作识别

张富凯,贺天成   

  1. 河南理工大学 计算机科学与技术学院,河南 焦作 454000
  • 出版日期:2022-09-15 发布日期:2022-09-15

Action Recognition Combined with Lightweight Openpose and Attention-Guided Graph Convolution

ZHANG Fukai, HE Tiancheng   

  1. School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan 454000, China
  • Online:2022-09-15 Published:2022-09-15

摘要: 现有人体姿态动作识别方法忽视前期姿态估计算法的作用,没有充分提取动作特征,提出一种结合轻量级Openpose和注意力引导图卷积网络的动作识别方法。该方法包含基于shufflenet的Openpose算法和基于不同尺度邻接矩阵注意力的图卷积算法。输入视频由轻量Openpose处理得到18个人体关键点信息,表达为基础时空图数据形式。节点的不同尺度邻居信息对应的邻接矩阵通过自注意力机制计算影响力,将各尺度邻接矩阵加权合并输入图卷积网络提取特征。提取到的鉴别特征通过全局平均池化和softmax分类器输出动作类别。在Le2i Fall Detection数据集和自定义的UR-KTH数据集上的实验表明,动作识别的准确率分别为95.52%和95.07%,达到了预期效果。

关键词: 动作识别, 姿态估计, 注意力, 图卷积网络

Abstract: Existing human pose action recognition methods ignore the role of the previous pose estimation algorithms and do not fully extract action features. This paper proposes an action recognition method that combines lightweight Openpose and attention-guided graph convolutional network. The method includes an Openpose algorithm based on shufflenet and a graph convolution algorithm based on attention of different scales of adjacency matrices. The input video is processed by lightweight Openpose to obtain 18 human body keypoint information, which is expressed as the basic spatiotemporal graph data form. The adjacency matrix corresponding to the neighbor information of different scales of the node calculates the influence through the self-attention mechanism, and weights the adjacency matrix of each scale. The input graph convolutional network is merged to extract features. The extracted discriminative features are output action categories through global average pooling and softmax classifier. Experiments on the Le2i Fall Detection dataset and the custom UR-KTH dataset show that the accuracy of action recognition is 95.52% and 95.07%, respectively, achieving the expected results.

Key words: action recognition, pose estimation, attention, graph convolutional network