结合轻量Openpose和注意力引导图卷积的动作识别

doi:10.3778/j.issn.1002-8331.2112-0508

摘要/Abstract

摘要： 现有人体姿态动作识别方法忽视前期姿态估计算法的作用，没有充分提取动作特征，提出一种结合轻量级Openpose和注意力引导图卷积网络的动作识别方法。该方法包含基于shufflenet的Openpose算法和基于不同尺度邻接矩阵注意力的图卷积算法。输入视频由轻量Openpose处理得到18个人体关键点信息，表达为基础时空图数据形式。节点的不同尺度邻居信息对应的邻接矩阵通过自注意力机制计算影响力，将各尺度邻接矩阵加权合并输入图卷积网络提取特征。提取到的鉴别特征通过全局平均池化和softmax分类器输出动作类别。在Le2i Fall Detection数据集和自定义的UR-KTH数据集上的实验表明，动作识别的准确率分别为95.52%和95.07%，达到了预期效果。

关键词: 动作识别, 姿态估计, 注意力, 图卷积网络

Abstract: Existing human pose action recognition methods ignore the role of the previous pose estimation algorithms and do not fully extract action features. This paper proposes an action recognition method that combines lightweight Openpose and attention-guided graph convolutional network. The method includes an Openpose algorithm based on shufflenet and a graph convolution algorithm based on attention of different scales of adjacency matrices. The input video is processed by lightweight Openpose to obtain 18 human body keypoint information, which is expressed as the basic spatiotemporal graph data form. The adjacency matrix corresponding to the neighbor information of different scales of the node calculates the influence through the self-attention mechanism, and weights the adjacency matrix of each scale. The input graph convolutional network is merged to extract features. The extracted discriminative features are output action categories through global average pooling and softmax classifier. Experiments on the Le2i Fall Detection dataset and the custom UR-KTH dataset show that the accuracy of action recognition is 95.52% and 95.07%, respectively, achieving the expected results.

Key words: action recognition, pose estimation, attention, graph convolutional network

张富凯, 贺天成. 结合轻量Openpose和注意力引导图卷积的动作识别[J]. 计算机工程与应用, 2022, 58(18): 180-187.

ZHANG Fukai, HE Tiancheng. Action Recognition Combined with Lightweight Openpose and Attention-Guided Graph Convolution[J]. Computer Engineering and Applications, 2022, 58(18): 180-187.

参考文献

[1] 张晓平，纪佳慧，王力，等.基于视频的人体异常行为识别与检测方法综述[J].控制与决策，2022，37（1）：14-27.
ZHANG X P，JI J H，WANG L，et al.Overview of video based human abnormal behavior recognition and detection methods[J].Control and Decision，2022，37（1）：14-27.
[2] 陆卫忠，宋正伟，吴宏杰，等.基于深度学习的人体行为检测方法研究综述[J].计算机工程与科学，2021，43（12）：2206-2215.
LU W Z，SONG Z W，WU H J，et al.Overview of human behavior detection methods based on deep learning[J].Compupter Engineering ＆Science，2021，43（12）：2206-2215.
[3] 刘云，薛盼盼，李辉，等.基于深度学习的关节点行为识别综述[J].电子与信息学报，2021，43（6）：1789-1802.
LIU Y，XUE P P，LI H，et al.A review of action recognition using joints based on deep learning[J].Journal of Electronics and Information Technology，2021，43（6）：1789-1802.
[4] CAO Z，HIDALGO G，SIMON T，et al.OpenPose：realtime multi-person 2D pose estimation using part affinity fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2021，43（1）：172-186.
[5] MA N，ZHANG X，ZHENG H T，et al.Shufflenet V2：practical guidelines for efficient cnn architecture design[C]//Proceedings of the 15th European Conference on Computer Vision（ECCV 2018），Munich，8-14 September 2018：122-138.
[6] YAN S J，XIONG Y J，LIN D H，et al.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence，New Orleans，2-7 February 2018：7444-7452.
[7] BAI S，ZICO KOLTER J，KOLTUN V.An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[J].arXiv：1803.01271，2018.
[8] LIU Z，ZHANG H，CHEN Z，et al.Disentangling and unifying graph convolutions for skeleton-based action recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），Seattle，13-19 June 2020：140-149.
[9] 周燕，刘紫琴，曾凡智，等.深度学习的二维人体姿态估计综述[J].计算机科学与探索，2021，15（4）：641-657.
ZHOU Y，LIU Z Q，ZENG F Z，et al.Survey on two-dimensional human pose estimation of deep learning[J].Journal of Frontiers of Computer Science and Technology，2021，15（4）：641-657.
[10] 杨君，张素君，张创豪，等.基于OpenPose的人体动作识别对比研究[J].传感器与微系统，2021，40（1）：5-8.
YANG J，ZHANG S J，ZHANG C H，et al.Research on human action recognition and contrast based on OpenPose[J].Transducer and Microsystem Technologies，2021，40（1）：5-8.
[11] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems（NIPS 2017），Long Beach，4-9 December 2017：5999-6009.
[12] KESKES O，NOUMEIR R.Vision-based fall detection using ST-GCN[J].IEEE Access，2021，9：28224-28236.
[13] LI M S，CHEN S H，XU C，et al.Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the 32nd IEEE Conference on Computer Vision and Pattern Recognition，Long Beach，15-20 June 2019：3590-3598.
[14] WU F，SOUZA A，ZHANG T，et al.Simplifying graph convolutional networks[C]//Proceedings of the 36th International Conference on Machine Learning（ICML 2019），2019：11884-11894.
[15] CHARFI I，MITERAN J，DUBOIS J，et al.Definition and performance evaluation of a robust SVM based fall detection solution[C]//2012 Eighth International Conference on Signal Image Technology and Internet Based Systems，Sorrento，Italy，25-29 Nov 2012：218-224.
[16] KWOLEK B，KEPSKI M.Human fall detection on embedded platform using depth maps and wireless accelerometer[J].Computer Methods and Programs in Biomedicine，2014，117（3）：489-501.
[17] KANG S M，WILDES R P.Review of action recognition and detection methods[J].arXiv：1610.06906，2016.
[18] SHI L，ZHANG Y F，CHENG J，et al.Two-stream adaptive graph convolutional networks for skeleton-based actionrecognition[C]//Proceedings of the 32nd IEEE Conference on Computer Vision and Pattern Recognition，Long Beach，15-20 June 2019：12018-12027.
[19] DUAN H，ZHAO Y，CHEN K，et al.Revisiting skeleton-based action recognition[J].arXiv：2104.13586，2021.