计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (20): 1-29.DOI: 10.3778/j.issn.1002-8331.2404-0143
边存灵,吕伟刚,冯伟
出版日期:
2024-10-15
发布日期:
2024-10-15
BIAN Cunling, LYU Weigang, FENG Wei
Online:
2024-10-15
Published:
2024-10-15
摘要: 人体行为识别在视频监控、人机交互、医疗看护、体育赛事分析等领域具备重要的应用前景。近年来,随着传感器技术和人体姿态估计算法的迅猛发展,基于骨架的人体行为识别受到研究者越来越多的重视。相较于传统的视频图像数据,骨架数据以行为人为中心,具有高度抽象的运动信息和低数据维度等特点,为行为信息建模提供了新的视角。以骨架人体行为识别为研究对象,对相关工作进行了全面系统的回顾和分析。通过文献计量分析法对已发表的相关文献进行了梳理,系统总结了基于骨架的行为识别的发展脉络。在此基础上,分别回顾了基于手工特征的传统识别方法和基于深度学习的识别方法,重点介绍了基于卷积神经网络、循环神经网络、图卷积神经网络以及Transformer方法的基本原理、改进策略和代表性工作,并简要论述了网络模型学习算法的研究现状。总结了基于运动捕捉系统、Kinect相机和RGB图像的三类公开数据集,并详细探讨了它们的特点和应用。最后,结合国内外研究现状及思考分析,梳理了基于骨架的人体行为识别中的关键难题与挑战,并展望了未来的发展方向,旨在为研究人员建立一个较完整的领域研究视图,为相关领域的工作提供参考和借鉴。
边存灵, 吕伟刚, 冯伟. 骨架人体行为识别研究回顾、现状及展望[J]. 计算机工程与应用, 2024, 60(20): 1-29.
BIAN Cunling, LYU Weigang, FENG Wei. Skeleton-Based Human Action Recognition:History,Status and Prospects[J]. Computer Engineering and Applications, 2024, 60(20): 1-29.
[1] ZHANG H B, ZHANG Y X, ZHONG B, et al. A comprehensive survey of vision-based human action recognition methods[J]. Sensors, 2019, 19(5): 1005. [2] WANG Y, CANG S, YU H. A survey on wearable sensor modality centred human activity recognition in health care[J]. Expert Systems with Applications, 2019, 137: 167-190. [3] HERATH S, HARANDI M, PORIKLI F. Going deeper into action recognition: a survey[J]. Image and Vision Computing, 2017, 60: 4-21. [4] KONG Y, FU Y. Human action recognition and rrediction: a survey[J]. International Journal of Computer Vision, 2022, 130(5): 1366-1401. [5] JOHANSSON G. Visual perception of biological motion and a model for its analysis[J]. Perception & Psychophysics, 1973, 14(2): 201-211. [6] PRESTI L L, LA CASCIA M. 3D skeleton-based human action classification: a survey[J]. Pattern Recognition, 2016, 53: 130-147. [7] WANG L, HUYNH D Q, KONIUSZ P. A comparative review of recent kinect-based action recognition algorithms[J]. IEEE Transactions on Image Processing, 2019, 29: 15-28. [8] HAN F, REILY B, HOFF W, et al. Space-time representation of people based on 3D skeletal data: a review[J]. Computer Vision and Image Understanding, 2017, 158: 85-105. [9] REN B, LIU M, DING R, et al. A survey on 3D skeleton-based action recognition using learning method[J]. arXiv:2002.05907, 2020. [10] 王帅琛, 黄倩, 张云飞, 等. 多模态数据的行为识别综述[J]. 中国图象图形学报, 2022, 27(11): 3139-3159. WANG S C, HUANG Q, ZHANG Y F, et al. Review of action recognition based on multimodal data[J]. Journal of Image and Graphics, 2022, 27(11): 3139-3159. [11] 卢健, 李萱峰, 赵博, 等. 骨骼信息的人体行为识别综述[J]. 中国图象图形学报, 2023, 28(12): 3651-3669. LU J, LI X F, ZHAN B, et al. A review of skeleton-based human action recognition[J]. Journal of Image and Graphics, 2023, 28(12): 3651-3669. [12] LV F J, NEVATIA R. Recognition and segmentation of 3-D human action using HMM and multi-class adaBoost[C]//Proceedings of the European Conference on Computer Vision, 2006: 359-372. [13] EVANGELIDIS G, SINGH G, HORAUD R. Skeletal quads: human action recognition using joint quadruples[C]//Proceedings of the IEEE International Conference on Pattern Recognition, 2014: 4513-4518. [14] XIA L, CHEN C C, AGGARWAL J K. View invariant human action recognition using histograms of 3D joints[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012: 20-27. [15] RAHMANI H, MAHMOOD A, HUYNH D Q, et al. Real time action recognition using histograms of depth gradients and random decision forests[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2014: 626-633. [16] YANG X, TIAN Y. Effective 3D action recognition using eigenjoints[J]. Journal of Visual Communication and Image Representation, 2014, 25(1): 2-11. [17] WEI P, ZHENG N, ZHAO Y, et al. Concurrent action detection with structural prediction[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 3136-3143. [18] JUNEJO I N, DEXTER E, LAPTEV I, et al. View-independent action recognition from temporal self-similarities[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(1): 172-185. [19] ZANFIR M, LEORDEANU M, SMINCHISESCU C. The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 2752-2759. [20] CHEN C, ZHUANG Y, NIE F, et al. Learning a 3D human pose distance metric from geometric pose descriptor[J]. IEEE Transactions on Visualization and Computer Graphics, 2010, 17(11): 1676-1689. [21] YAO A, GALL J, FANELLI G, et al. Does human action recognition benefit from pose estimation?[C]//Proceedings of the British Machine Vision Conference, 2011. [22] MüLLER M, R?DER T. Motion templates for automatic classification and retrieval of motion capture data[C]//Proceedings of the ACM SIGGRAPH Eurographics Symposium on Computer Animation, 2006: 137-146. [23] HUSSEIN M E, TORKI M, GOWAYYED M A, et al. Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2013. [24] VEMULAPALLI R, CHELLAPA R. Rolling rotations for recognizing human actions from 3D skeletal data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 4471-4479. [25] WANG C, WANG Y, YUILLE A L. An approach to pose-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2013: 915-922. [26] KE Q, AN S, BENNAMOUN M, et al. Skeletonnet: mining deep part features for 3-D action recognition[J]. IEEE Signal Processing Letters, 2017, 24(6): 731-735. [27] VEMULAPALLI R, ARRATE F, CHELLAPPA R. Human action recognition by representing 3D skeletons as points in a Lie group[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2014: 588-595. [28] HAN L, WU X, LIANG W, et al. Discriminative human action recognition in the learned hierarchical manifold space[J]. Image and Vision Computing, 2010, 28(5): 836-849. [29] OHN-BAR E, TRIVEDI M. Joint angles similarities and HOG2 for action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2013: 465-470. [30] WANG J, LIU Z, WU Y, et al. Mining actionlet ensemble for action recognition with depth cameras[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012: 1290-1297. [31] NIE S, JI Q. Capturing global and local dynamics for human action recognition[C]//Proceedings of the International Conference on Pattern Recognition, 2014: 1946-1951. [32] LEE I, KIM D, KANG S, et al. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 1012-1020. [33] SONG S, LAN C, XING J, et al. An end-to-end spatio-temporal attention model for human action recognition from skeleton data[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2017: 4263-4270. [34] LI S, LI W, COOK C, et al. Independently recurrent neural network (IndRNN): building a longer and deeper RNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 5457-5466. [35] LIU M, LIU H, CHEN C. Enhanced skeleton visualization for view invariant human action recognition[J]. Pattern Recognition, 2017, 68: 346-362. [36] DUAN H, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 2969-2978. [37] BAVIL A F, DAMIRCHI H, TAGHIRAD H D. Action Capsules: human skeleton action recognition[J]. Computer Vision and Image Understanding, 2023, 233: 103722. [38] YAN S, XIONG Y, LIN D. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018. [39] LI M, CHEN S, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3595-3603. [40] ZHANG P, LAN C, ZENG W, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1112-1121. [41] CHEN Y, ZHANG Z, YUAN C, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 13359-13368. [42] SONG Y F, ZHANG Z, SHAN C, et al. Constructing stronger and faster baselines for skeleton-based action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(2): 1474-1488. [43] CAI D, KANG Y, YAO A, et al. Ske2Grid: skeleton-to-grid representation learning for action recognition[C]//Proceedings of the International Conference on Machine Learning, 2023: 3431-3441. [44] PLIZZARI C, CANNICI M, MATTEUCCI M. Skeleton-based action recognition via spatial and temporal transformer networks[J]. Computer Vision and Image Understanding, 2021, 208: 103219. [45] KONG J, BIAN Y, JIANG M. MTT: multi-scale temporal transformer for skeleton-based action recognition[J]. IEEE Signal Processing Letters, 2022, 29: 528-532. [46] WANG L, KONIUSZ P. 3Mformer: multi-order multi-mode transformer for skeletal action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 5620-5631. [47] DU Y, WANG W, WANG L. Hierarchical recurrent neural network for skeleton based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 1110-1118. [48] LIU J, SHAHROUDY A, XU D, et al. Spatio-temporal LSTM with trust gates for 3D human action recognition[C]//Proceedings of the European Conference on Computer Vision, 2016: 816-833. [49] WANG H, WANG L. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 499-508. [50] VEERIAH V, ZHUANG N, QI G J. Differential recurrent neural networks for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 4041-4049. [51] ZHU W, LAN C, XING J, et al. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016: 3697-3704. [52] LIU J, WANG G, HU P, et al. Global context-aware attention lstm networks for 3D action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 1647-1656. [53] DU Y, FU Y, WANG L. Skeleton based action recognition with convolutional neural network[C]//Proceedings of the Asian Conference on Pattern Recognition, 2015: 579-583. [54] KE Q, BENNAMOUN M, AN S, et al. A new representation of skeleton sequences for 3D action recognition[C]//Proceedings of the IEEE Conference on Computer Vsion and Pattern Recognition, 2017: 3288-3297. [55] TAS Y, KONIUSZ P. CNN-based action recognition and supervised domain adaptation on 3D body skeletons via kernel feature maps[C]//Proceedings of the British Machine Vision Conference, 2018: 158. [56] WANG P, LI Z, HOU Y, et al. Action recognition based on joint trajectory maps using convolutional neural networks[C]//Proceedings of the 24th ACM International Conference on Multimedia, 2016: 102-106. [57] CAETANO C, SENA J, BRéMOND F, et al. Skelemotion: a new representation of skeleton joint sequences based on motion information for 3D action recognition[C]//Proceedings of the IEEE International Conference on Advanced Video and Signal based Surveillance, 2019: 1-8. [58] MINH L T, INOUE N, SHINODA K. A fine-to-coarse convolutional neural network for 3D human action recognition[C]//Proceedings of the British Machine Vision Conference, 2018: 227. [59] LI C, ZHONG Q, XIE D, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[C]//Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018:?786-792. [60] LI C, XIE C, ZHANG B, et al. Memory attention networks for skeleton-based action recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(9): 4800-4814. [61] 梁成武, 胡伟, 杨杰, 等. 融合时空领域知识与数据驱动的骨架行为识别[J]. 计算机工程与应用: 1-14(2024-02-28) [2024-04-01]. https://link.cnki.net/urlid/11.2127.TP.20240228. 1257.008. LIANG C W, HU W, YANG J, et al. Fusion of spatial-temporal domain knowledge and data-driven for skeleton-based action recognition[J]. Computer Engineering and Applications: 1-14(2024-02-28) [2024-04-01]. https://link.cnki.net/urlid/11.2127.TP.20240228.1257.008. [62] XU K, YE F, ZHONG Q, et al. Topology-aware convolutional neural network for efficient skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 2866-2874. [63] ZHANG P, LAN C, XING J, et al. View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1963-1978. [64] LIU Z, ZHANG H, CHEN Z, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 143-152. [65] CHENG K, ZHANG Y, HE X, et al. Skeleton-based action recognition with shift graph convolutional network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 183-192. [66] WEN Y H, GAO L, FU H, et al. Graph CNNs with motif and variable temporal block for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 8989-8996. [67] GAO X, HU W, TANG J, et al. Optimized skeleton-based action recognition via sparsified graph regression[C]//Proceedings of the ACM International Conference on Multimedia, 2019: 601-610. [68] LI B, LI X, ZHANG Z, et al. Spatio-temporal graph routing for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 8561-8568. [69] HUANG Z, SHEN X, TIAN X, et al. Spatio-temporal inception graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the ACM International Conference on Multimedia, 2020: 2122-2130. [70] SHI L, ZHANG Y, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7912-7921. [71] THAKKAR K C, NARAYANAN P J. Part-based graph convolutional network for action recognition[C]//Proceedings of the British Machine Vision Conference, 2018: 270. [72] LEE J, LEE M, LEE D, et al. Hierarchically decomposed graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 10444-10453. [73] CHEN T, ZHOU D, WANG J, et al. Learning multi-granular spatio-temporal graph network for skeleton-based action recognition[C]//Proceedings of the ACM International Conference on Multimedia, 2021: 4334-4342. [74] MIAO S, HOU Y, GAO Z, et al. A central difference graph convolutional operator for skeleton-based action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(7): 4893-4899. [75] LEE J, LEE M, CHO S, et al. Leveraging spatio-temporal dependency for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 10255-10264. [76] SI C, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]//Proceedings of the European Conference on Computer Vision, 2018: 103-118. [77] SI C, CHEN W, WANG W, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 1227-1236. [78] ZHOU H, LIU Q, WANG Y. Learning discriminative representations for skeleton based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10608-10617. [79] 白杉, 冯秀芳. 基于注意力增强的中心差分自适应图卷积的骨架行为识别[J]. 计算机工程与科学, 2023, 45(7): 1263-1273. BAI S, FENG X F. Skeleton behavior recognition based on attention-enhanced central difference adaptive graph convolution[J]. Computer Engineering & Science, 2023, 45(7): 1263-1273. [80] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008. [81] 卢先领, 杨嘉琦. 时空关联的Transformer骨架行为识别[J]. 信号处理, 2024, 40(4): 766-775. LU X L, YANG J Q. Space-time correlated Transformer for skeleton-based action recognition[J]. Journal of Signal Procesing, 2024, 40(4): 766-775. [82] KIM B, CHANG H J, KIM J, et al. Global-local motion transformer for unsupervised skeleton-based action learning[C]//Proceedings of the European Conference on Computer Vision, 2022: 209-225. [83] CHEN Y, ZHAO L, YUAN J, et al. Hierarchically self-supervised transformer for human skeleton representation learning[C]//Proceedings of the European Conference on Computer Vision, 2022: 185-202. [84] WEN Y, TANG Z, PANG Y, et al. Interactive spatiotemporal token attention network for skeleton-based general interactive action recognition[C]//Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2023: 7886-7892. [85] DUAN H, XU M, SHUAI B, et al. SkeleTR: towards skeleton-based action recognition in the wild[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 13634-13644. [86] ZHENG N, WEN J, LIU R, et al. Unsupervised representation learning with long-term dynamics for skeleton based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018. [87] SU K, LIU X, SHLIZERMAN E. Predict & cluster: unsupervised skeleton based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9631-9640. [88] LI L, WANG M, NI B, et al. 3D human action representation learning via cross-view consistency pursuit[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4741-4750. [89] WANG P, WEN J, SI C, et al. Contrast-reconstruction representation learning for self-supervised skeleton-based action recognition[J]. IEEE Transactions on Image Processing, 2022, 31: 6224-6238. [90] GUO T, LIU H, CHEN Z, et al. Contrastive learning from extremely augmented skeleton sequences for self-supervised action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 762-770. [91] ZHANG J, LIN L, LIU J. Hierarchical consistent contrastive learning for skeleton-based action recognition with growing augmentations[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2023: 3427-3435. [92] DONG J, SUN S, LIU Z, et al. Hierarchical contrast for unsupervised skeleton-based action representation learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2023: 525-533. [93] HUANG X, ZHOU H, WANG J, et al. Graph contrastive learning for skeleton-based action recognition[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2023. [94] ZHOU Y, DUAN H, RAO A, et al. Self-supervised action representation learning from partial spatio-temporal skeleton sequences[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2023: 3825-3833. [95] FRANCO L, MANDICA P, MUNJAL B, et al. Hyperbolic self-paced learning for self-supervised skeleton-based action representations[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2023. [96] LIN L, ZHANG J, LIU J. Actionlet-dependent contrastive learning for unsupervised skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 2363-2372. [97] SHAH A, ROY A, SHAH K, et al. HALP: hallucinating latent positives for skeleton-based self-supervised learning of actions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 18846-18856. [98] YAN H, LIU Y, WEI Y, et al. Skeletonmae: graph-based masked autoencoder for skeleton sequence pre-training[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 5606-5618. [99] ZHANG J, LIN L, LIU J. Prompted contrast with masked motion modeling: towards versatile 3D action representation learning[C]//Proceedings of the ACM International Conference on Multimedia, 2023: 7175-7183. [100] SUN S, LIU D, DONG J, et al. Unified multi-modal unsupervised representation learning for skeleton-based action understanding[C]//Proceedings of the ACM International Conference on Multimedia, 2023: 2973-2984. [101] LIU J, AKHTAR N, MIAN A. Adversarial attack on skeleton-based human action recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 33(4): 1609-1622. [102] WANG H, HE F, PENG Z, et al. Understanding the robustness of skeleton-based action recognition under adversarial attack[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14656-14665. [103] DIAO Y, SHAO T, YANG Y L, et al. BASAR: black-box attack on skeletal action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7597-7607. [104] TANAKA N, KERA H, KAWAMOTO K. Adversarial bone length attack on action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 2335-2343. [105] LU Z, WANG H, CHANG Z, et al. Hard no-box adversarial attack on skeleton-based human action recognition with skeleton-motion-informed gradient[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 4597-4606. [106] PARK H, WANG Z J, DAS N, et al. SkeletonVis: interactive visualization for understanding adversarial attacks on human action recognition models[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 16094-16096. [107] ZHOU Y, QIANG W, RAO A, et al. Zero-shot skeleton-based action recognition via mutual information estimation and maximization[C]//Proceedings of the ACM International Conference on Multimedia, 2023: 5302-5310. [108] SATO F, HACHIUMA R, SEKII T. Prompt-guided zero-shot anomaly action recognition using pretrained deep skeleton features[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 6471-6480. [109] YANG F, WU Y, SAKTI S, et al. Make skeleton-based action recognition model smaller, faster and better[C]//Proceedings of the ACM Multimedia Asia, 2019: 1-6. [110] 刘锁兰, 王炎, 王洪元, 等. 基于多流语义图卷积网络的人体行为识别[J]. 计算机工程, 2024, 50(8): 64-74. LIU S L, WANG Y, WANG H Y, et al. Human behavior recognition based on multi-stream semantic graph convolutional network[J]. Computer Engineering, 2024, 50(8): 64-74. [111] HEDEGAARD L, HEIDARI N, IOSIFIDIS A. Continual spatio-temporal graph convolutional networks[J]. Pattern Recognition, 2023, 140: 109528. [112] TANG Y, TIAN Y, LU J, et al. Deep progressive reinforcement learning for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 5323-5332. [113] PENG W, HONG X, CHEN H, et al. Learning graph convolutional network for skeleton-based human action recognition by neural searching[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 2669-2676. [114] HACHIUMA R, SATO F, SEKII T. Unified keypoint-based action recognition framework via structured keypoint pooling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 22962-22971. [115] XIANG W, LI C, ZHOU Y, et al. Generative action description prompts for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 10276-10285. [116] BIASI N, SETTI F, DEL BUE A, et al. Garment-based motion capture (GaMoCap): high-density capture of human shape in motion[J]. Machine Vision and Applications, 2015, 26(7/8): 955-973. [117] DE LA TORRE F, HODGINS J, BARGTEIL A, et al. Guide to the carnegie mellon university multimodal activity (CMU-MMAC) database[R]. Pittsburgh: Carnegie Mellon University, 2009. [118] MüLLER M, R?DER T, CLAUSEN M, et al. Mocap database HDM05[D]. Bonn Universit?t Bonn, 2007. [119] TENORTH M, BANDOUCH J, BEETZ M. The TUMkitchen data set of everyday manipulation activities for motion tracking and action recognition[C]//Proceedings of the International Conference on Computer Vision Workshops, 2009: 1089-1096. [120] OFLI F, CHAUDHRY R, KURILLO G, et al. Berkeley MHAD: a comprehensive multimodal human action database[C]//Proceedings of the IEEE Workshop on Applications of Computer Vision, 2013: 53-60. [121] SIGAL L, BALAN A O, BLACK M J. HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion[J]. International Journal of Computer Vision, 2010, 87(1/2): 4-27. [122] IONESCU C, PAPAVA D, OLARU V, et al. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325-1339. [123] LI W, ZHANG Z, LIU Z. Action recognition based on a bag of 3D points[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2010: 9-14. [124] OREIFEJ O, LIU Z. Hon4D: histogram of oriented 4D normals for activity recognition from depth sequences[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2013: 716-723. [125] FOTHERGILL S, MENTIS H, KOHLI P, et al. Instructing people for training gestural interactive systems[C]//Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2012: 1737-1746. [126] SUNG J, PONCE C, SELMAN B, et al. Unstructured human activity detection from RGBD images[C]//Proceedings of the IEEE International Conference on Robotics and Automation, 2012: 842-849. [127] KOPPULA H S, GUPTA R, SAXENA A. Learning human activities and object affordances from RGB-D videos[J]. The International Journal of Robotics Research, 2013, 32(8): 951-970. [128] WANG J, NIE X, XIA Y, et al. Cross-view action modeling, learning and recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2014: 2649-2656. [129] JI Y, XU F, YANG Y, et al. A large-scale varying-view RGB-D action dataset for arbitrary-view human action recognition[J]. arXiv:1904.10681, 2019. [130] SHAHROUDY A, LIU J, NG T T, et al. NTU RGB+ D: a large scale dataset for 3D human activity analysis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 1010-1019. [131] LIU J, SHAHROUDY A, PEREZ M, et al. NTU RGB+ D 120: a large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2684-2701. [132] YUN K, HONORIO J, CHATTOPADHYAY D, et al. Two-person interaction detection using body-pose features and multiple instance learning[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012: 28-35. [133] HU J F, ZHENG W S, LAI J, et al. Jointly learning heterogeneous features for RGB-D activity recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 5344-5352. [134] WANG K, WANG X, LIN L, et al. 3D human activity recognition with reconfigurable convolutional neural networks[C]//Proceedings of the ACM International Conference on Multimedia, 2014: 97-106. [135] BLOOM V, MAKRIS D, ARGYRIOU V. G3D: a gaming action dataset and real time action recognition evaluation framework[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012: 7-12. [136] SEIDENARI L, VARANO V, BERRETTI S, et al. Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2013: 479-485. [137] GUYON I, ATHITSOS V, JANGYODSUK P, et al. The ChaLearn gesture dataset[J]. Machine Vision and Applications, 2014, 25(8): 1929-1951. [138] WEI P, ZHAO Y, ZHENG N, et al. Modeling 4D human-object interactions for event and object recognition[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 3272-3279. [139] ELLIS C, MASOOD S Z, TAPPEN M F, et al. Exploring the trade-off between accuracy and observational latency in action recognition[J]. International Journal of Computer Vision, 2013, 101(3): 420-436. [140] CHEN C, JAFARI R, KEHTARNAVAZ N. UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]//Proceedings of the IEEE International Conference on Image Processing, 2015: 168-172. [141] LILLO I, SOTO A, CARLOS NIEBLES J. Discriminative hierarchical modeling of spatio-temporally composable human activities[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2014: 812-819. [142] WU C, ZHANG J, SAVARESE S, et al. Watch-n-patch: unsupervised understanding of actions and relations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 4362-4370. [143] XU N, LIU A, NIE W, et al. Multi-modal & multi-view & interactive benchmark dataset for human action recognition[C]//Proceedings of the ACM International Conference on Multimedia, 2015: 1195-1198. [144] RAHMANI H, MAHMOOD A, Q HUYNH D, et al. HOPC: histogram of oriented principal components of 3D pointclouds for action recognition[C]//Proceedings of the European Conference on Computer Vision, 2014: 742-757. [145] LI Y, LAN C, XING J, et al. Online human action detection using joint classification-regression recurrent neural networks[C]//Proceedings of the European Conference on Computer Vision, 2016: 203-220. [146] LIU C, HU Y, LI Y, et al. PKU-MMD: a large scale benchmark for continuous multi-modal human action understanding[J]. arXiv:1703.07475, 2017. [147] CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 7291-7299. [148] JHUANG H, GALL J, ZUFFI S, et al. Towards understanding action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 3192-3199. [149] ZHANG W, ZHU M, DERPANIS K G. From actemes to action: a strongly-supervised representation for detailed action understanding[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 2248-2255. [150] ZHU Y, CHEN W, GUO G. Fusing spatiotemporal features and joints for 3D action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013: 486-491. [151] HUYNH-THE T, LE B V, LEE S. Describing body-pose feature-poselet-activity relationship using Pachinko allocation model[C]//Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2016: 40-45. [152] HUYNH-THE T, HUA C H, TU N A, et al. Hierarchical topic modeling with pose-transition feature for action recognition using 3D skeleton data[J]. Information Sciences, 2018, 444: 20-35. [153] SU B, WU H, SHENG M, et al. Accurate hierarchical human actions recognition from kinect skeleton data[J]. IEEE Access, 2019, 7: 52532-52541. [154] WEI S, SONG Y, ZHANG Y. Human skeleton tree recurrent neural network with joint relative motion feature for skeleton based action recognition[C]//Proceedings of the IEEE International Conference on Image Processing, 2017: 91-95. [155] GAO X, HU W, TANG J, et al. Generalized graph convolutional networks for skeleton-based action recognition[J]. arXiv:1811.12013, 2018. [156] RHIF M, WANNOUS H, FARAH I R. Action recognition from 3D skeleton sequences using deep networks on Lie group features[C]//Proceedings of the International Conference on Pattern Recognition, 2018: 3427-3432. [157] 吴潇颖, 李锐, 吴胜昔. 基于CNN与双向LSTM的行为识别算法[J]. 计算机工程与设计, 2020, 41(2): 361-366. WU X Y, LI R, WU S X. Action recognition algorithm based on CNN and bidirectional LSTM[J]. Computer Engineering and Design, 2020, 41(2): 361-366. [158] WANG H, WANG L. Beyond joints: Learning representations from primitive geometries for skeleton-based action recognition and detection[J]. IEEE Transactions on Image Processing, 2018, 27(9): 4382-4394. [159] SHI L, ZHANG Y, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 12026-12035. [160] ZHANG Y, WU B, LI W, et al. STST: spatial-temporal specialized transformer for skeleton-based action recognition[C]//Proceedings of the ACM International Conference on Multimedia, 2021: 3229-3237. [161] JIANG Y, SUN Z, YU S, et al. A graph skeleton transformer network for action recognition[J]. Symmetry, 2022, 14(8): 1547. [162] QIU H, HOU B, REN B, et al. Spatio-temporal tuples transformer for skeleton-based action recognition[J]. arXiv:2201. 02849, 2022. [163] BAI R, LI M, MENG B, et al. Hierarchical graph convolutional skeleton transformer for action recognition[C]//Proceedings of the IEEE International Conference on Multimedia and Expo, 2022: 1-6. [164] ZHANG P, LAN C, XING J, et al. View adaptive recurrent neural networks for high performance human action recognition from skeleton data[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2117-2126. [165] IBRAHIM M S, MURALIDHARAN S, DENG Z, et al. A hierarchical deep temporal model for group activity recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016: 1971-1980. [166] WU J, WANG L, WANG L, et al. Learning actor relation graphs for group activity recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9964-9974. [167] AZAR S M, ATIGH M G, NICKABADI A, et al. Convolutional relational machine for group activity recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7892-7901. [168] BIAN C, FENG W, WANG S. Self-supervised representation learning for skeleton-based group activity recognition[C]//Proceedings of the ACM International Conference on Multimedia, 2022: 5990-5998. [169] KHAIRE P, IMRAN J, KUMAR P. Human activity recognition by fusion of RGB, depth, and skeletal data[C]//Proceedings of the International Conference on Computer Vision & Image Processing, 2018: 409-421. [170] HU J F, ZHENG W S, PAN J, et al. Deep bilinear learning for rgb-d action recognition[C]//Proceedings of the European Conference on Computer Vision, 2018: 335-351. [171] ZHAO R, ALI H, VAN DER SMAGT P. Two-stream RNN/CNN for action recognition in 3D videos[C]//Proceedings of the International Conference on Intelligent Robots and Systems, 2017: 4260-4267. [172] HU Z, XIAO J, LI L, et al. Human-centric multimodal fusion network for robust action recognition[J]. Expert Systems with Applications, 2024, 239: 122314. [173] CHéRON G, LAPTEV I, SCHMID C. P-CNN: pose-based CNN features for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 3218-3226. [174] LI J, XIE X, PAN Q, et al. SGM-Net: skeleton-guided multimodal network for action recognition[J]. Pattern Recognition, 2020, 104: 107356. [175] ZHU X, ZHU Y, WANG H, et al. Skeleton sequence and RGB frame based multi-modality feature fusion network for action recognition[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2022, 18(3): 1-24. [176] HAO T, WU D, WANG Q, et al. Multi-view representation learning for multi-view action recognition[J]. Journal of Visual Communication and Image Representation, 2017, 48: 453-460. [177] WANG Q, SUN G, DONG J, et al. Continuous multi-view human action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(6): 3603-3614. [178] SIDDIQUI N, TIRUPATTUR P, SHAH M. DVANet: disentangling view and action features for multi-view action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 4873-4881. [179] SONG S, LAN C, XING J, et al. Spatio-temporal attention-based LSTM networks for 3D action recognition and detection[J]. IEEE Transactions on Image Processing, 2018, 27(7): 3459-3471. [180] KE Q, LIU J, BENNAMOUN M, et al. Global regularizer and temporal-aware cross-entropy for skeleton-based early action recognition[C]//Proceedings of the Asian Conference on Computer Vision, 2019: 729-745. [181] YANG D, WANG Y, DANTCHEVA A, et al. LAC-latent action composition for skeleton-based action segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 13679-13690. [182] GAVRILYUK K, GHODRATI A, LI Z, et al. Actor and action video segmentation from a sentence[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 5958-5966. [183] RICHARD A, KUEHNE H, GALL J. Action sets: weakly supervised action segmentation without ordering constraints[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 5987-5996. [184] CHEN G, ZHENG Y D, WANG L, et al. DCAN: improving temporal action detection via dual context aggregation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 248-257. [185] LIN C, LI J, WANG Y, et al. Fast learning of temporal action proposal via dense boundary generator[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 11499-11506. [186] YANG M, CHEN G, ZHENG Y D, et al. BasicTAD: an astounding rgb-only baseline for temporal action detection[J]. Computer Vision and Image Understanding, 2023, 232: 103692. [187] KONG Y, TAO Z, FU Y. Deep sequential context networks for action prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 1473-1481. [188] WANG X, HU J F, LAI J H, et al. Progressive teacher-student learning for early action prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3556-3565. |
[1] | 陶林娟, 华庚兴, 李波. 基于位置增强词向量和GRU-CNN的方面级情感分析模型研究[J]. 计算机工程与应用, 2024, 60(9): 212-218. |
[2] | 车运龙, 袁亮, 孙丽慧. 基于强语义关键点采样的三维目标检测方法[J]. 计算机工程与应用, 2024, 60(9): 254-260. |
[3] | 邱云飞, 王宜帆. 双分支结构的多层级三维点云补全[J]. 计算机工程与应用, 2024, 60(9): 272-282. |
[4] | 叶彬, 朱兴帅, 姚康, 丁上上, 付威威. 面向桌面交互场景的双目深度测量方法[J]. 计算机工程与应用, 2024, 60(9): 283-291. |
[5] | 王彩玲, 闫晶晶, 张智栋. 基于多模态数据的人体行为识别方法研究综述[J]. 计算机工程与应用, 2024, 60(9): 1-18. |
[6] | 廉露, 田启川, 谭润, 张晓行. 基于神经网络的图像风格迁移研究进展[J]. 计算机工程与应用, 2024, 60(9): 30-47. |
[7] | 杨晨曦, 庄旭菲, 陈俊楠, 李衡. 基于深度学习的公交行驶轨迹预测研究综述[J]. 计算机工程与应用, 2024, 60(9): 65-78. |
[8] | 张俊三, 肖森, 高慧, 邵明文, 张培颖, 朱杰. 基于邻域采样的多任务图推荐算法[J]. 计算机工程与应用, 2024, 60(9): 172-180. |
[9] | 许智宏, 张天润, 王利琴, 董永峰. 融合图谱重构的时序知识图谱推理[J]. 计算机工程与应用, 2024, 60(9): 181-187. |
[10] | 宋建平, 王毅, 孙开伟, 刘期烈. 结合双曲图注意力网络与标签信息的短文本分类方法[J]. 计算机工程与应用, 2024, 60(9): 188-195. |
[11] | 杨文涛, 雷雨琦, 李星月, 郑天成. 融合汉字输入法的BERT与BLCG的长文本分类研究[J]. 计算机工程与应用, 2024, 60(9): 196-202. |
[12] | 邓希泉, 陈刚. ConvUCaps:基于卷积胶囊网络的医学图像分割模型[J]. 计算机工程与应用, 2024, 60(8): 258-266. |
[13] | 周定威, 扈静, 张良锐, 段飞亚. 面向目标检测的数据集标签遗漏的协同修正技术[J]. 计算机工程与应用, 2024, 60(8): 267-273. |
[14] | 王永贵, 王芯茹. 融合自注意力和图卷积的多视图群组推荐[J]. 计算机工程与应用, 2024, 60(8): 287-295. |
[15] | 钱平, 韩睿, 谢凌东, 罗旺, 徐华荣, 李松松, 郑振东. 支持抑制型脉冲神经网络的硬件加速器[J]. 计算机工程与应用, 2024, 60(8): 338-347. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||