Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (20): 1-29.DOI: 10.3778/j.issn.1002-8331.2404-0143
• Research Hotspots and Reviews • Previous Articles Next Articles
BIAN Cunling, LYU Weigang, FENG Wei
Online:
2024-10-15
Published:
2024-10-15
边存灵,吕伟刚,冯伟
BIAN Cunling, LYU Weigang, FENG Wei. Skeleton-Based Human Action Recognition:History,Status and Prospects[J]. Computer Engineering and Applications, 2024, 60(20): 1-29.
边存灵, 吕伟刚, 冯伟. 骨架人体行为识别研究回顾、现状及展望[J]. 计算机工程与应用, 2024, 60(20): 1-29.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2404-0143
[1] ZHANG H B, ZHANG Y X, ZHONG B, et al. A comprehensive survey of vision-based human action recognition methods[J]. Sensors, 2019, 19(5): 1005. [2] WANG Y, CANG S, YU H. A survey on wearable sensor modality centred human activity recognition in health care[J]. Expert Systems with Applications, 2019, 137: 167-190. [3] HERATH S, HARANDI M, PORIKLI F. Going deeper into action recognition: a survey[J]. Image and Vision Computing, 2017, 60: 4-21. [4] KONG Y, FU Y. Human action recognition and rrediction: a survey[J]. International Journal of Computer Vision, 2022, 130(5): 1366-1401. [5] JOHANSSON G. Visual perception of biological motion and a model for its analysis[J]. Perception & Psychophysics, 1973, 14(2): 201-211. [6] PRESTI L L, LA CASCIA M. 3D skeleton-based human action classification: a survey[J]. Pattern Recognition, 2016, 53: 130-147. [7] WANG L, HUYNH D Q, KONIUSZ P. A comparative review of recent kinect-based action recognition algorithms[J]. IEEE Transactions on Image Processing, 2019, 29: 15-28. [8] HAN F, REILY B, HOFF W, et al. Space-time representation of people based on 3D skeletal data: a review[J]. Computer Vision and Image Understanding, 2017, 158: 85-105. [9] REN B, LIU M, DING R, et al. A survey on 3D skeleton-based action recognition using learning method[J]. arXiv:2002.05907, 2020. [10] 王帅琛, 黄倩, 张云飞, 等. 多模态数据的行为识别综述[J]. 中国图象图形学报, 2022, 27(11): 3139-3159. WANG S C, HUANG Q, ZHANG Y F, et al. Review of action recognition based on multimodal data[J]. Journal of Image and Graphics, 2022, 27(11): 3139-3159. [11] 卢健, 李萱峰, 赵博, 等. 骨骼信息的人体行为识别综述[J]. 中国图象图形学报, 2023, 28(12): 3651-3669. LU J, LI X F, ZHAN B, et al. A review of skeleton-based human action recognition[J]. Journal of Image and Graphics, 2023, 28(12): 3651-3669. [12] LV F J, NEVATIA R. Recognition and segmentation of 3-D human action using HMM and multi-class adaBoost[C]//Proceedings of the European Conference on Computer Vision, 2006: 359-372. [13] EVANGELIDIS G, SINGH G, HORAUD R. Skeletal quads: human action recognition using joint quadruples[C]//Proceedings of the IEEE International Conference on Pattern Recognition, 2014: 4513-4518. [14] XIA L, CHEN C C, AGGARWAL J K. View invariant human action recognition using histograms of 3D joints[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012: 20-27. [15] RAHMANI H, MAHMOOD A, HUYNH D Q, et al. Real time action recognition using histograms of depth gradients and random decision forests[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2014: 626-633. [16] YANG X, TIAN Y. Effective 3D action recognition using eigenjoints[J]. Journal of Visual Communication and Image Representation, 2014, 25(1): 2-11. [17] WEI P, ZHENG N, ZHAO Y, et al. Concurrent action detection with structural prediction[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 3136-3143. [18] JUNEJO I N, DEXTER E, LAPTEV I, et al. View-independent action recognition from temporal self-similarities[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(1): 172-185. [19] ZANFIR M, LEORDEANU M, SMINCHISESCU C. The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 2752-2759. [20] CHEN C, ZHUANG Y, NIE F, et al. Learning a 3D human pose distance metric from geometric pose descriptor[J]. IEEE Transactions on Visualization and Computer Graphics, 2010, 17(11): 1676-1689. [21] YAO A, GALL J, FANELLI G, et al. Does human action recognition benefit from pose estimation?[C]//Proceedings of the British Machine Vision Conference, 2011. [22] MüLLER M, R?DER T. Motion templates for automatic classification and retrieval of motion capture data[C]//Proceedings of the ACM SIGGRAPH Eurographics Symposium on Computer Animation, 2006: 137-146. [23] HUSSEIN M E, TORKI M, GOWAYYED M A, et al. Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2013. [24] VEMULAPALLI R, CHELLAPA R. Rolling rotations for recognizing human actions from 3D skeletal data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 4471-4479. [25] WANG C, WANG Y, YUILLE A L. An approach to pose-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2013: 915-922. [26] KE Q, AN S, BENNAMOUN M, et al. Skeletonnet: mining deep part features for 3-D action recognition[J]. IEEE Signal Processing Letters, 2017, 24(6): 731-735. [27] VEMULAPALLI R, ARRATE F, CHELLAPPA R. Human action recognition by representing 3D skeletons as points in a Lie group[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2014: 588-595. [28] HAN L, WU X, LIANG W, et al. Discriminative human action recognition in the learned hierarchical manifold space[J]. Image and Vision Computing, 2010, 28(5): 836-849. [29] OHN-BAR E, TRIVEDI M. Joint angles similarities and HOG2 for action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2013: 465-470. [30] WANG J, LIU Z, WU Y, et al. Mining actionlet ensemble for action recognition with depth cameras[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012: 1290-1297. [31] NIE S, JI Q. Capturing global and local dynamics for human action recognition[C]//Proceedings of the International Conference on Pattern Recognition, 2014: 1946-1951. [32] LEE I, KIM D, KANG S, et al. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 1012-1020. [33] SONG S, LAN C, XING J, et al. An end-to-end spatio-temporal attention model for human action recognition from skeleton data[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2017: 4263-4270. [34] LI S, LI W, COOK C, et al. Independently recurrent neural network (IndRNN): building a longer and deeper RNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 5457-5466. [35] LIU M, LIU H, CHEN C. Enhanced skeleton visualization for view invariant human action recognition[J]. Pattern Recognition, 2017, 68: 346-362. [36] DUAN H, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 2969-2978. [37] BAVIL A F, DAMIRCHI H, TAGHIRAD H D. Action Capsules: human skeleton action recognition[J]. Computer Vision and Image Understanding, 2023, 233: 103722. [38] YAN S, XIONG Y, LIN D. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018. [39] LI M, CHEN S, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3595-3603. [40] ZHANG P, LAN C, ZENG W, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1112-1121. [41] CHEN Y, ZHANG Z, YUAN C, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 13359-13368. [42] SONG Y F, ZHANG Z, SHAN C, et al. Constructing stronger and faster baselines for skeleton-based action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(2): 1474-1488. [43] CAI D, KANG Y, YAO A, et al. Ske2Grid: skeleton-to-grid representation learning for action recognition[C]//Proceedings of the International Conference on Machine Learning, 2023: 3431-3441. [44] PLIZZARI C, CANNICI M, MATTEUCCI M. Skeleton-based action recognition via spatial and temporal transformer networks[J]. Computer Vision and Image Understanding, 2021, 208: 103219. [45] KONG J, BIAN Y, JIANG M. MTT: multi-scale temporal transformer for skeleton-based action recognition[J]. IEEE Signal Processing Letters, 2022, 29: 528-532. [46] WANG L, KONIUSZ P. 3Mformer: multi-order multi-mode transformer for skeletal action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 5620-5631. [47] DU Y, WANG W, WANG L. Hierarchical recurrent neural network for skeleton based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 1110-1118. [48] LIU J, SHAHROUDY A, XU D, et al. Spatio-temporal LSTM with trust gates for 3D human action recognition[C]//Proceedings of the European Conference on Computer Vision, 2016: 816-833. [49] WANG H, WANG L. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 499-508. [50] VEERIAH V, ZHUANG N, QI G J. Differential recurrent neural networks for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 4041-4049. [51] ZHU W, LAN C, XING J, et al. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016: 3697-3704. [52] LIU J, WANG G, HU P, et al. Global context-aware attention lstm networks for 3D action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 1647-1656. [53] DU Y, FU Y, WANG L. Skeleton based action recognition with convolutional neural network[C]//Proceedings of the Asian Conference on Pattern Recognition, 2015: 579-583. [54] KE Q, BENNAMOUN M, AN S, et al. A new representation of skeleton sequences for 3D action recognition[C]//Proceedings of the IEEE Conference on Computer Vsion and Pattern Recognition, 2017: 3288-3297. [55] TAS Y, KONIUSZ P. CNN-based action recognition and supervised domain adaptation on 3D body skeletons via kernel feature maps[C]//Proceedings of the British Machine Vision Conference, 2018: 158. [56] WANG P, LI Z, HOU Y, et al. Action recognition based on joint trajectory maps using convolutional neural networks[C]//Proceedings of the 24th ACM International Conference on Multimedia, 2016: 102-106. [57] CAETANO C, SENA J, BRéMOND F, et al. Skelemotion: a new representation of skeleton joint sequences based on motion information for 3D action recognition[C]//Proceedings of the IEEE International Conference on Advanced Video and Signal based Surveillance, 2019: 1-8. [58] MINH L T, INOUE N, SHINODA K. A fine-to-coarse convolutional neural network for 3D human action recognition[C]//Proceedings of the British Machine Vision Conference, 2018: 227. [59] LI C, ZHONG Q, XIE D, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[C]//Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018:?786-792. [60] LI C, XIE C, ZHANG B, et al. Memory attention networks for skeleton-based action recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(9): 4800-4814. [61] 梁成武, 胡伟, 杨杰, 等. 融合时空领域知识与数据驱动的骨架行为识别[J]. 计算机工程与应用: 1-14(2024-02-28) [2024-04-01]. https://link.cnki.net/urlid/11.2127.TP.20240228. 1257.008. LIANG C W, HU W, YANG J, et al. Fusion of spatial-temporal domain knowledge and data-driven for skeleton-based action recognition[J]. Computer Engineering and Applications: 1-14(2024-02-28) [2024-04-01]. https://link.cnki.net/urlid/11.2127.TP.20240228.1257.008. [62] XU K, YE F, ZHONG Q, et al. Topology-aware convolutional neural network for efficient skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 2866-2874. [63] ZHANG P, LAN C, XING J, et al. View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1963-1978. [64] LIU Z, ZHANG H, CHEN Z, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 143-152. [65] CHENG K, ZHANG Y, HE X, et al. Skeleton-based action recognition with shift graph convolutional network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 183-192. [66] WEN Y H, GAO L, FU H, et al. Graph CNNs with motif and variable temporal block for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 8989-8996. [67] GAO X, HU W, TANG J, et al. Optimized skeleton-based action recognition via sparsified graph regression[C]//Proceedings of the ACM International Conference on Multimedia, 2019: 601-610. [68] LI B, LI X, ZHANG Z, et al. Spatio-temporal graph routing for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 8561-8568. [69] HUANG Z, SHEN X, TIAN X, et al. Spatio-temporal inception graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the ACM International Conference on Multimedia, 2020: 2122-2130. [70] SHI L, ZHANG Y, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7912-7921. [71] THAKKAR K C, NARAYANAN P J. Part-based graph convolutional network for action recognition[C]//Proceedings of the British Machine Vision Conference, 2018: 270. [72] LEE J, LEE M, LEE D, et al. Hierarchically decomposed graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 10444-10453. [73] CHEN T, ZHOU D, WANG J, et al. Learning multi-granular spatio-temporal graph network for skeleton-based action recognition[C]//Proceedings of the ACM International Conference on Multimedia, 2021: 4334-4342. [74] MIAO S, HOU Y, GAO Z, et al. A central difference graph convolutional operator for skeleton-based action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(7): 4893-4899. [75] LEE J, LEE M, CHO S, et al. Leveraging spatio-temporal dependency for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 10255-10264. [76] SI C, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]//Proceedings of the European Conference on Computer Vision, 2018: 103-118. [77] SI C, CHEN W, WANG W, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 1227-1236. [78] ZHOU H, LIU Q, WANG Y. Learning discriminative representations for skeleton based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10608-10617. [79] 白杉, 冯秀芳. 基于注意力增强的中心差分自适应图卷积的骨架行为识别[J]. 计算机工程与科学, 2023, 45(7): 1263-1273. BAI S, FENG X F. Skeleton behavior recognition based on attention-enhanced central difference adaptive graph convolution[J]. Computer Engineering & Science, 2023, 45(7): 1263-1273. [80] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008. [81] 卢先领, 杨嘉琦. 时空关联的Transformer骨架行为识别[J]. 信号处理, 2024, 40(4): 766-775. LU X L, YANG J Q. Space-time correlated Transformer for skeleton-based action recognition[J]. Journal of Signal Procesing, 2024, 40(4): 766-775. [82] KIM B, CHANG H J, KIM J, et al. Global-local motion transformer for unsupervised skeleton-based action learning[C]//Proceedings of the European Conference on Computer Vision, 2022: 209-225. [83] CHEN Y, ZHAO L, YUAN J, et al. Hierarchically self-supervised transformer for human skeleton representation learning[C]//Proceedings of the European Conference on Computer Vision, 2022: 185-202. [84] WEN Y, TANG Z, PANG Y, et al. Interactive spatiotemporal token attention network for skeleton-based general interactive action recognition[C]//Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2023: 7886-7892. [85] DUAN H, XU M, SHUAI B, et al. SkeleTR: towards skeleton-based action recognition in the wild[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 13634-13644. [86] ZHENG N, WEN J, LIU R, et al. Unsupervised representation learning with long-term dynamics for skeleton based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018. [87] SU K, LIU X, SHLIZERMAN E. Predict & cluster: unsupervised skeleton based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9631-9640. [88] LI L, WANG M, NI B, et al. 3D human action representation learning via cross-view consistency pursuit[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4741-4750. [89] WANG P, WEN J, SI C, et al. Contrast-reconstruction representation learning for self-supervised skeleton-based action recognition[J]. IEEE Transactions on Image Processing, 2022, 31: 6224-6238. [90] GUO T, LIU H, CHEN Z, et al. Contrastive learning from extremely augmented skeleton sequences for self-supervised action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 762-770. [91] ZHANG J, LIN L, LIU J. Hierarchical consistent contrastive learning for skeleton-based action recognition with growing augmentations[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2023: 3427-3435. [92] DONG J, SUN S, LIU Z, et al. Hierarchical contrast for unsupervised skeleton-based action representation learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2023: 525-533. [93] HUANG X, ZHOU H, WANG J, et al. Graph contrastive learning for skeleton-based action recognition[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2023. [94] ZHOU Y, DUAN H, RAO A, et al. Self-supervised action representation learning from partial spatio-temporal skeleton sequences[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2023: 3825-3833. [95] FRANCO L, MANDICA P, MUNJAL B, et al. Hyperbolic self-paced learning for self-supervised skeleton-based action representations[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2023. [96] LIN L, ZHANG J, LIU J. Actionlet-dependent contrastive learning for unsupervised skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 2363-2372. [97] SHAH A, ROY A, SHAH K, et al. HALP: hallucinating latent positives for skeleton-based self-supervised learning of actions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 18846-18856. [98] YAN H, LIU Y, WEI Y, et al. Skeletonmae: graph-based masked autoencoder for skeleton sequence pre-training[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 5606-5618. [99] ZHANG J, LIN L, LIU J. Prompted contrast with masked motion modeling: towards versatile 3D action representation learning[C]//Proceedings of the ACM International Conference on Multimedia, 2023: 7175-7183. [100] SUN S, LIU D, DONG J, et al. Unified multi-modal unsupervised representation learning for skeleton-based action understanding[C]//Proceedings of the ACM International Conference on Multimedia, 2023: 2973-2984. [101] LIU J, AKHTAR N, MIAN A. Adversarial attack on skeleton-based human action recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 33(4): 1609-1622. [102] WANG H, HE F, PENG Z, et al. Understanding the robustness of skeleton-based action recognition under adversarial attack[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14656-14665. [103] DIAO Y, SHAO T, YANG Y L, et al. BASAR: black-box attack on skeletal action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7597-7607. [104] TANAKA N, KERA H, KAWAMOTO K. Adversarial bone length attack on action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 2335-2343. [105] LU Z, WANG H, CHANG Z, et al. Hard no-box adversarial attack on skeleton-based human action recognition with skeleton-motion-informed gradient[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 4597-4606. [106] PARK H, WANG Z J, DAS N, et al. SkeletonVis: interactive visualization for understanding adversarial attacks on human action recognition models[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 16094-16096. [107] ZHOU Y, QIANG W, RAO A, et al. Zero-shot skeleton-based action recognition via mutual information estimation and maximization[C]//Proceedings of the ACM International Conference on Multimedia, 2023: 5302-5310. [108] SATO F, HACHIUMA R, SEKII T. Prompt-guided zero-shot anomaly action recognition using pretrained deep skeleton features[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 6471-6480. [109] YANG F, WU Y, SAKTI S, et al. Make skeleton-based action recognition model smaller, faster and better[C]//Proceedings of the ACM Multimedia Asia, 2019: 1-6. [110] 刘锁兰, 王炎, 王洪元, 等. 基于多流语义图卷积网络的人体行为识别[J]. 计算机工程, 2024, 50(8): 64-74. LIU S L, WANG Y, WANG H Y, et al. Human behavior recognition based on multi-stream semantic graph convolutional network[J]. Computer Engineering, 2024, 50(8): 64-74. [111] HEDEGAARD L, HEIDARI N, IOSIFIDIS A. Continual spatio-temporal graph convolutional networks[J]. Pattern Recognition, 2023, 140: 109528. [112] TANG Y, TIAN Y, LU J, et al. Deep progressive reinforcement learning for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 5323-5332. [113] PENG W, HONG X, CHEN H, et al. Learning graph convolutional network for skeleton-based human action recognition by neural searching[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 2669-2676. [114] HACHIUMA R, SATO F, SEKII T. Unified keypoint-based action recognition framework via structured keypoint pooling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 22962-22971. [115] XIANG W, LI C, ZHOU Y, et al. Generative action description prompts for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 10276-10285. [116] BIASI N, SETTI F, DEL BUE A, et al. Garment-based motion capture (GaMoCap): high-density capture of human shape in motion[J]. Machine Vision and Applications, 2015, 26(7/8): 955-973. [117] DE LA TORRE F, HODGINS J, BARGTEIL A, et al. Guide to the carnegie mellon university multimodal activity (CMU-MMAC) database[R]. Pittsburgh: Carnegie Mellon University, 2009. [118] MüLLER M, R?DER T, CLAUSEN M, et al. Mocap database HDM05[D]. Bonn Universit?t Bonn, 2007. [119] TENORTH M, BANDOUCH J, BEETZ M. The TUMkitchen data set of everyday manipulation activities for motion tracking and action recognition[C]//Proceedings of the International Conference on Computer Vision Workshops, 2009: 1089-1096. [120] OFLI F, CHAUDHRY R, KURILLO G, et al. Berkeley MHAD: a comprehensive multimodal human action database[C]//Proceedings of the IEEE Workshop on Applications of Computer Vision, 2013: 53-60. [121] SIGAL L, BALAN A O, BLACK M J. HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion[J]. International Journal of Computer Vision, 2010, 87(1/2): 4-27. [122] IONESCU C, PAPAVA D, OLARU V, et al. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325-1339. [123] LI W, ZHANG Z, LIU Z. Action recognition based on a bag of 3D points[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2010: 9-14. [124] OREIFEJ O, LIU Z. Hon4D: histogram of oriented 4D normals for activity recognition from depth sequences[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2013: 716-723. [125] FOTHERGILL S, MENTIS H, KOHLI P, et al. Instructing people for training gestural interactive systems[C]//Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2012: 1737-1746. [126] SUNG J, PONCE C, SELMAN B, et al. Unstructured human activity detection from RGBD images[C]//Proceedings of the IEEE International Conference on Robotics and Automation, 2012: 842-849. [127] KOPPULA H S, GUPTA R, SAXENA A. Learning human activities and object affordances from RGB-D videos[J]. The International Journal of Robotics Research, 2013, 32(8): 951-970. [128] WANG J, NIE X, XIA Y, et al. Cross-view action modeling, learning and recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2014: 2649-2656. [129] JI Y, XU F, YANG Y, et al. A large-scale varying-view RGB-D action dataset for arbitrary-view human action recognition[J]. arXiv:1904.10681, 2019. [130] SHAHROUDY A, LIU J, NG T T, et al. NTU RGB+ D: a large scale dataset for 3D human activity analysis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 1010-1019. [131] LIU J, SHAHROUDY A, PEREZ M, et al. NTU RGB+ D 120: a large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2684-2701. [132] YUN K, HONORIO J, CHATTOPADHYAY D, et al. Two-person interaction detection using body-pose features and multiple instance learning[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012: 28-35. [133] HU J F, ZHENG W S, LAI J, et al. Jointly learning heterogeneous features for RGB-D activity recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 5344-5352. [134] WANG K, WANG X, LIN L, et al. 3D human activity recognition with reconfigurable convolutional neural networks[C]//Proceedings of the ACM International Conference on Multimedia, 2014: 97-106. [135] BLOOM V, MAKRIS D, ARGYRIOU V. G3D: a gaming action dataset and real time action recognition evaluation framework[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012: 7-12. [136] SEIDENARI L, VARANO V, BERRETTI S, et al. Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2013: 479-485. [137] GUYON I, ATHITSOS V, JANGYODSUK P, et al. The ChaLearn gesture dataset[J]. Machine Vision and Applications, 2014, 25(8): 1929-1951. [138] WEI P, ZHAO Y, ZHENG N, et al. Modeling 4D human-object interactions for event and object recognition[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 3272-3279. [139] ELLIS C, MASOOD S Z, TAPPEN M F, et al. Exploring the trade-off between accuracy and observational latency in action recognition[J]. International Journal of Computer Vision, 2013, 101(3): 420-436. [140] CHEN C, JAFARI R, KEHTARNAVAZ N. UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]//Proceedings of the IEEE International Conference on Image Processing, 2015: 168-172. [141] LILLO I, SOTO A, CARLOS NIEBLES J. Discriminative hierarchical modeling of spatio-temporally composable human activities[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2014: 812-819. [142] WU C, ZHANG J, SAVARESE S, et al. Watch-n-patch: unsupervised understanding of actions and relations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 4362-4370. [143] XU N, LIU A, NIE W, et al. Multi-modal & multi-view & interactive benchmark dataset for human action recognition[C]//Proceedings of the ACM International Conference on Multimedia, 2015: 1195-1198. [144] RAHMANI H, MAHMOOD A, Q HUYNH D, et al. HOPC: histogram of oriented principal components of 3D pointclouds for action recognition[C]//Proceedings of the European Conference on Computer Vision, 2014: 742-757. [145] LI Y, LAN C, XING J, et al. Online human action detection using joint classification-regression recurrent neural networks[C]//Proceedings of the European Conference on Computer Vision, 2016: 203-220. [146] LIU C, HU Y, LI Y, et al. PKU-MMD: a large scale benchmark for continuous multi-modal human action understanding[J]. arXiv:1703.07475, 2017. [147] CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 7291-7299. [148] JHUANG H, GALL J, ZUFFI S, et al. Towards understanding action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 3192-3199. [149] ZHANG W, ZHU M, DERPANIS K G. From actemes to action: a strongly-supervised representation for detailed action understanding[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 2248-2255. [150] ZHU Y, CHEN W, GUO G. Fusing spatiotemporal features and joints for 3D action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013: 486-491. [151] HUYNH-THE T, LE B V, LEE S. Describing body-pose feature-poselet-activity relationship using Pachinko allocation model[C]//Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2016: 40-45. [152] HUYNH-THE T, HUA C H, TU N A, et al. Hierarchical topic modeling with pose-transition feature for action recognition using 3D skeleton data[J]. Information Sciences, 2018, 444: 20-35. [153] SU B, WU H, SHENG M, et al. Accurate hierarchical human actions recognition from kinect skeleton data[J]. IEEE Access, 2019, 7: 52532-52541. [154] WEI S, SONG Y, ZHANG Y. Human skeleton tree recurrent neural network with joint relative motion feature for skeleton based action recognition[C]//Proceedings of the IEEE International Conference on Image Processing, 2017: 91-95. [155] GAO X, HU W, TANG J, et al. Generalized graph convolutional networks for skeleton-based action recognition[J]. arXiv:1811.12013, 2018. [156] RHIF M, WANNOUS H, FARAH I R. Action recognition from 3D skeleton sequences using deep networks on Lie group features[C]//Proceedings of the International Conference on Pattern Recognition, 2018: 3427-3432. [157] 吴潇颖, 李锐, 吴胜昔. 基于CNN与双向LSTM的行为识别算法[J]. 计算机工程与设计, 2020, 41(2): 361-366. WU X Y, LI R, WU S X. Action recognition algorithm based on CNN and bidirectional LSTM[J]. Computer Engineering and Design, 2020, 41(2): 361-366. [158] WANG H, WANG L. Beyond joints: Learning representations from primitive geometries for skeleton-based action recognition and detection[J]. IEEE Transactions on Image Processing, 2018, 27(9): 4382-4394. [159] SHI L, ZHANG Y, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 12026-12035. [160] ZHANG Y, WU B, LI W, et al. STST: spatial-temporal specialized transformer for skeleton-based action recognition[C]//Proceedings of the ACM International Conference on Multimedia, 2021: 3229-3237. [161] JIANG Y, SUN Z, YU S, et al. A graph skeleton transformer network for action recognition[J]. Symmetry, 2022, 14(8): 1547. [162] QIU H, HOU B, REN B, et al. Spatio-temporal tuples transformer for skeleton-based action recognition[J]. arXiv:2201. 02849, 2022. [163] BAI R, LI M, MENG B, et al. Hierarchical graph convolutional skeleton transformer for action recognition[C]//Proceedings of the IEEE International Conference on Multimedia and Expo, 2022: 1-6. [164] ZHANG P, LAN C, XING J, et al. View adaptive recurrent neural networks for high performance human action recognition from skeleton data[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2117-2126. [165] IBRAHIM M S, MURALIDHARAN S, DENG Z, et al. A hierarchical deep temporal model for group activity recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016: 1971-1980. [166] WU J, WANG L, WANG L, et al. Learning actor relation graphs for group activity recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9964-9974. [167] AZAR S M, ATIGH M G, NICKABADI A, et al. Convolutional relational machine for group activity recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7892-7901. [168] BIAN C, FENG W, WANG S. Self-supervised representation learning for skeleton-based group activity recognition[C]//Proceedings of the ACM International Conference on Multimedia, 2022: 5990-5998. [169] KHAIRE P, IMRAN J, KUMAR P. Human activity recognition by fusion of RGB, depth, and skeletal data[C]//Proceedings of the International Conference on Computer Vision & Image Processing, 2018: 409-421. [170] HU J F, ZHENG W S, PAN J, et al. Deep bilinear learning for rgb-d action recognition[C]//Proceedings of the European Conference on Computer Vision, 2018: 335-351. [171] ZHAO R, ALI H, VAN DER SMAGT P. Two-stream RNN/CNN for action recognition in 3D videos[C]//Proceedings of the International Conference on Intelligent Robots and Systems, 2017: 4260-4267. [172] HU Z, XIAO J, LI L, et al. Human-centric multimodal fusion network for robust action recognition[J]. Expert Systems with Applications, 2024, 239: 122314. [173] CHéRON G, LAPTEV I, SCHMID C. P-CNN: pose-based CNN features for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 3218-3226. [174] LI J, XIE X, PAN Q, et al. SGM-Net: skeleton-guided multimodal network for action recognition[J]. Pattern Recognition, 2020, 104: 107356. [175] ZHU X, ZHU Y, WANG H, et al. Skeleton sequence and RGB frame based multi-modality feature fusion network for action recognition[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2022, 18(3): 1-24. [176] HAO T, WU D, WANG Q, et al. Multi-view representation learning for multi-view action recognition[J]. Journal of Visual Communication and Image Representation, 2017, 48: 453-460. [177] WANG Q, SUN G, DONG J, et al. Continuous multi-view human action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(6): 3603-3614. [178] SIDDIQUI N, TIRUPATTUR P, SHAH M. DVANet: disentangling view and action features for multi-view action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 4873-4881. [179] SONG S, LAN C, XING J, et al. Spatio-temporal attention-based LSTM networks for 3D action recognition and detection[J]. IEEE Transactions on Image Processing, 2018, 27(7): 3459-3471. [180] KE Q, LIU J, BENNAMOUN M, et al. Global regularizer and temporal-aware cross-entropy for skeleton-based early action recognition[C]//Proceedings of the Asian Conference on Computer Vision, 2019: 729-745. [181] YANG D, WANG Y, DANTCHEVA A, et al. LAC-latent action composition for skeleton-based action segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 13679-13690. [182] GAVRILYUK K, GHODRATI A, LI Z, et al. Actor and action video segmentation from a sentence[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 5958-5966. [183] RICHARD A, KUEHNE H, GALL J. Action sets: weakly supervised action segmentation without ordering constraints[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 5987-5996. [184] CHEN G, ZHENG Y D, WANG L, et al. DCAN: improving temporal action detection via dual context aggregation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 248-257. [185] LIN C, LI J, WANG Y, et al. Fast learning of temporal action proposal via dense boundary generator[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 11499-11506. [186] YANG M, CHEN G, ZHENG Y D, et al. BasicTAD: an astounding rgb-only baseline for temporal action detection[J]. Computer Vision and Image Understanding, 2023, 232: 103692. [187] KONG Y, TAO Z, FU Y. Deep sequential context networks for action prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 1473-1481. [188] WANG X, HU J F, LAI J H, et al. Progressive teacher-student learning for early action prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3556-3565. |
[1] | TAO Linjuan, HUA Gengxing, LI Bo. Aspect-Level Sentiment Analysis Based on Location-Enhanced Word Embeddings and GRU-CNN Model [J]. Computer Engineering and Applications, 2024, 60(9): 212-218. |
[2] | CHE Yunlong, YUAN Liang, SUN Lihui. 3D Object Detection Based on Strong Semantic Key Point Sampling [J]. Computer Engineering and Applications, 2024, 60(9): 254-260. |
[3] | QIU Yunfei, WANG Yifan. Multi-Level 3D Point Cloud Completion with Dual-Branch Structure [J]. Computer Engineering and Applications, 2024, 60(9): 272-282. |
[4] | YE Bin, ZHU Xingshuai, YAO Kang, DING Shangshang, FU Weiwei. Binocular Depth Measurement Method for Desktop Interaction Scene [J]. Computer Engineering and Applications, 2024, 60(9): 283-291. |
[5] | WANG Cailing, YAN Jingjing, ZHANG Zhidong. Review on Human Action Recognition Methods Based on Multimodal Data [J]. Computer Engineering and Applications, 2024, 60(9): 1-18. |
[6] | LIAN Lu, TIAN Qichuan, TAN Run, ZHANG Xiaohang. Research Progress of Image Style Transfer Based on Neural Network [J]. Computer Engineering and Applications, 2024, 60(9): 30-47. |
[7] | YANG Chenxi, ZHUANG Xufei, CHEN Junnan, LI Heng. Review of Research on Bus Travel Trajectory Prediction Based on Deep Learning [J]. Computer Engineering and Applications, 2024, 60(9): 65-78. |
[8] | ZHANG Junsan, XIAO Sen, GAO Hui, SHAO Mingwen, ZHANG Peiying, ZHU Jie. Multi-Task Graph Recommendation Algorithm Based on Neighborhood Sampling [J]. Computer Engineering and Applications, 2024, 60(9): 172-180. |
[9] | SONG Jianping, WANG Yi, SUN Kaiwei, LIU Qilie. Short Text Classification Combined with Hyperbolic Graph Attention Networks and Labels [J]. Computer Engineering and Applications, 2024, 60(9): 188-195. |
[10] | YANG Wentao, LEI Yuqi, LI Xingyue, ZHENG Tiancheng. Chinese Long Text Classification Model Based on BERT Fused Chinese Input Methods and BLCG [J]. Computer Engineering and Applications, 2024, 60(9): 196-202. |
[11] | DENG Xiquan, CHEN Gang. ConvUCaps: Medical Image Segmentation Model Based on Convolutional Capsule Network [J]. Computer Engineering and Applications, 2024, 60(8): 258-266. |
[12] | ZHOU Dingwei, HU Jing, ZHANG Liangrui, DUAN Feiya. Collaborative Correction Technology of Label Omission in Dataset for Object Detection [J]. Computer Engineering and Applications, 2024, 60(8): 267-273. |
[13] | WANG Yonggui, WANG Xinru. Multi-View Group Recommendation Integrating Self-Attention and Graph Convolution [J]. Computer Engineering and Applications, 2024, 60(8): 287-295. |
[14] | QIAN Ping, HAN Rui, XIE Lingdong, LUO Wang, XU Huarong, LI Songsong, ZHENG Zhendong. Hardware Accelerator Supporting Inhibitory Spiking Neural Network [J]. Computer Engineering and Applications, 2024, 60(8): 338-347. |
[15] | ZHOU Bojun, CHEN Zhiyu. Survey of Few-Shot Image Classification Based on Deep Meta-Learning [J]. Computer Engineering and Applications, 2024, 60(8): 1-15. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||