Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (5): 165-176.DOI: 10.3778/j.issn.1002-8331.2310-0292
• Pattern Recognition and Artificial Intelligence • Previous Articles Next Articles
LIANG Chengwu , HU Wei, YANG Jie, JIANG Songqi, HOU Ning
Online:
2025-03-01
Published:
2025-03-01
梁成武,胡伟,杨杰,蒋松琪,侯宁
LIANG Chengwu , HU Wei, YANG Jie, JIANG Songqi, HOU Ning. Fusion of Spatio-Temporal Domain Knowledge and Data-Driven for Skeleton-Based Action Recognition[J]. Computer Engineering and Applications, 2025, 61(5): 165-176.
梁成武, 胡伟, 杨杰, 蒋松琪, 侯宁. 融合时空领域知识与数据驱动的骨架行为识别[J]. 计算机工程与应用, 2025, 61(5): 165-176.
[1] 龚苏明, 陈莹. 时空特征金字塔模块下的视频行为识别[J]. 计算机科学与探索, 2022, 16(9): 2061-2067. GONG S M, CHEN Y. Video action recognition based on spatio-temporal feature pyramid module[J]. Journal of Frontiers of Computer Science & Technology, 2022, 16 (9): 2061-2067. [2] 于海港, 何宁, 刘圣杰, 等. 基于时空信息融合的人体行为识别研究[J]. 计算机工程与应用, 2023, 59(3): 202-208. YU H G, HE N, LIU S J, et al. Research on human behavior recognition based on temporal and spatial information fusion[J]. Computer Engineering and Applications, 2023, 59(3): 202-208. [3] LIU Z, ZHANG H, CHEN Z, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 143-152. [4] DUAN H, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 2969-2978. [5] YAN S, XIONG Y, LIN D. Spatial temporal graph convolutional networks for skeleton-based action recognition[J]. arXiv:1801. 07455,2018. [6] LI B, LI X, ZHANG Z, et al. Spatio-temporal graph routing for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 8561-8568. [7] XIE J, MENG Y, ZHAO Y, et al. Dynamic semantic-based spatial-temporal graph convolution network for skeleton-based human action recognition[J]. IEEE Transactions on Image Processing, 2024, 33(11): 6691-6704. [8] GENG P, LU X, LI W, LYU L. Hierarchical aggregated graph neural network for skeleton-based action recognition [J]. IEEE Transactions on Multimedia, 2024, 26(7): 11003-11017. [9] 牛为华, 翟瑞冰. 基于改进3D ResNet的视频人体行为识别方法研究[J]. 计算机工程与科学, 2023, 45(10): 1814-1821. NIU W H, ZHAI R B. Based on improved 3D ResNet[J]. Computer Engineering and Science, 2023, 45(10): 1814-1821. [10] BENAIM S, EPHRAT A, LANG O, et al. SpeedNet: learning the speediness in videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9922-9931. [11] ZHI Y, TONG Z, WANG L, et al. MGSampler: an explainable sampling strategy for video action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 1513-1522. [12] CHEN P, HUANG D, HE D, et al. RSPNet: relative speed perception for unsupervised video representation learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 1045-1053. [13] BEHRMANN N, FAYYAZ M, GALL J, et al. Long short view feature decomposition via contrastive video representation learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 9244-9253. [14] HAN P, MA Z, LIU J. Topology-embedded temporal attention for fine-grained skeleton-based action recognition[J]. Applied Sciences, 2022, 12(16): 8023. [15] KWON H, KIM M, KWAK S, et al. Learning self-similarity in space and time as generalized motion for video action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 13065-13075. [16] TAO L, WANG X, YAMASAKI T. Motion representation using residual frames with 3D CNN[C]//Proceedings of the 2020 IEEE International Conference on Image Processing, 2020: 1786-1790. [17] 谢昭, 周义, 吴克伟. 基于时空关注度 LSTM 的行为识别[J]. 计算机学报, 2021, 44(2): 261-274. XIE Z, ZHOU Y, WU K W. Activity recognition based on spatial-temporal attention LSTM[J]. Journal of Computer Science, 2021, 44(2): 261-274. [18] HAN T, XIE W, ZISSERMAN A. Self-supervised co-training for video representation learning[C]//Advances in Neural Information Processing Systems, 2020: 5679-5690. [19] ELHASSAN M A M, HUANG C, YANG C, et al. DSANet: dilated spatial attention for real-time semantic segmentation in urban street scenes[J]. Expert Systems with Applications, 2021, 183: 115090. [20] 余金锁, 卢先领. 基于分割注意力的特征融合CNN-Bi-LSTM人体行为识别算法[J]. 电子测量与仪器学报, 2022, 36(2): 89-95. YU J S, LU X. Feature fusion CNN-Bi-LSTM human behavior recognition algorithm based on segmentation attention[J]. Journal of Electronic Measurement and Instrument, 2022, 36(2): 89-95. [21] SHAHROUDY A, LIU J, NG T T, et al. NTU RGB+D: a large scale dataset for 3D human activity analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 1010-1019. [22] LIU J, SHAHROUDY A, PEREZ M, et al. NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2684-2701. [23] SHAO D, ZHAO Y, DAI B, et al. FineGym: a hierarchical video dataset for fine-grained action understanding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 2616-2625. [24] CAETANO C, SENA J, BRéMOND F, et al. Skelemotion: a new representation of skeleton joint sequences based on motion information for 3D action recognition[C]//Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2019: 1-8. [25] LI C, ZHONG Q, XIE D, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[J]. arXiv:1804.06055, 2018. [26] LI M, CHEN S, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3595-3603. [27] SHI L, ZHANG Y, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7912-7921. [28] CHOUTAS V, WEINZAEPFEL P, REVAUD J, et al. Potion: pose motion representation for action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7024-7033. [29] JOZE H R V, SHABAN A, IUZZOLINO M L, et al. MMTM: multimodal transfer module for CNN fusion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 13289-13299. [30] KE Q, BENNAMOUN M, AN S, et al. A new representation of skeleton sequences for 3d action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 3288-3297. [31] 施海勇, 侯振杰, 巢新, 等. 多模态时空特征表示及其在行为识别中的应用[J]. 中国图象图形学报, 2023, 28(4): 1041-1055. SHI H Y , HOU Z J , CHAO X , et al. Multimodal spatial-temporal feature representation and its application in action recognition[J]. Journal of Image and Graphics, 2023, 28(4): 1041-1055. [32] GAO Z, XIE J, WANG Q, et al. Global second-order pooling convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3024-3033. [33] WANG L, TONG Z, JI B, et al. TDN: temporal difference networks for efficient action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 1895-1904. [34] 卢先领, 杨嘉琦. 时空关联的Transformer骨架行为识别[J]. 信号处理, 2024, 40(4): 766-775. LU X L, YANG J Q. Identification of Transformer skeleton action for spatiotemporal association[J]. Signal Processing, 2024, 40(4): 766-775. [35] FEICHTENHOFER C, FAN H, MALIK J, et al. SlowFast networks for video recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 6202-6211. [36] SUN K , XIAO B , LIU D , et al. Deep high-resolution representation learning for human pose estimation[J]. arXiv:2019.00584,2020. [37] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 4489-4497. [38] LIU J, AKHTAR N, MIAN A. Skepxels: spatio-temporal image representation of human skeleton joints for action recognition[J]. arXiv:1711.05941, 2017. [39] SHI L, ZHANG Y, CHENG J, et al. Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition[C]//Proceedings of the Asian Conference on Computer Vision, 2020. [40] XU K, YE F, ZHONG Q, et al. Topology-aware convolutional neural network for efficient skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 2866-2874. [41] SONG Y F, ZHANG Z, SHAN C, et al. Richly activated graph convolutional network for robust skeleton-based action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(5): 1915-192. [42] YANG H, YAN D, ZHANG L, et al. Feedback graph convolutional network for skeleton-based action recognition[J]. IEEE Transactions on Image Processing, 2021, 31: 164-175. [43] CHENG K, ZHANG Y, HE X, et al. Skeleton-based action recognition with shift graph convolutional network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 183-192. [44] ZHANG P, LAN C, ZENG W, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1112-1121. [45] QIN Z, LIU Y, JI P, et al. Fusing higher-order features in graph neural networks for skeleton-based action recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(4): 4783-4797. [46] CHI H, HA M H, CHI S, et al. InfoGCN: representation learning for human skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 20186-20196. [47] CHEN Y , ZHANG Z , YUAN C , et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[J]. arXiv:2107.12213, 2021. |
[1] | LI Zehui, ZHANG Lin, SHAN Xianying. Review on Improvement and Application of 3D Convolutional Neural Networks [J]. Computer Engineering and Applications, 2025, 61(3): 48-61. |
[2] | SHAN Xianying, ZHANG Lin, LI Zehui. Review of Research Progress in Object Detection Driven by Deep Learning [J]. Computer Engineering and Applications, 2025, 61(1): 24-41. |
[3] | LI Jiaze, MEI Hongyan, JIA Liyun, LI Wenya. Multimodal Emotion Recognition Method Based on Dynamic Time Sequence Modeling [J]. Computer Engineering and Applications, 2025, 61(1): 196-205. |
[4] | CHANG Jian, CHEN Hongfu, WANG Bingbing. Underwater Image Enhancement Based on Parallel Guidance of Transformer and CNN [J]. Computer Engineering and Applications, 2024, 60(4): 280-288. |
[5] | BIAN Cunling, LYU Weigang, FENG Wei. Skeleton-Based Human Action Recognition:History,Status and Prospects [J]. Computer Engineering and Applications, 2024, 60(20): 1-29. |
[6] | ZHU Lei, WANG Qianqian, YAO Lina, PAN Yang, ZHANG Bo. Fabric Defect Detection Method with Improved YOLOv5 [J]. Computer Engineering and Applications, 2024, 60(20): 302-311. |
[7] | XIAO Lei, CHEN Zhenjia. Review of Data-Driven Approaches to Chinese Named Entity Recognition [J]. Computer Engineering and Applications, 2024, 60(16): 34-48. |
[8] | WANG Yonggui, HU Pengcheng, SHI Qiwen, ZHAO Yang, ZOU Heyu. Cross-Domain Recommendation Algorithm Combining Information Bottleneck and Graph Convolution [J]. Computer Engineering and Applications, 2024, 60(15): 77-90. |
[9] | ZHANG Fang, WANG Fei, SUN Baoshuo. Application of Random Matrix Theory in Critical Path Identification of Expressway [J]. Computer Engineering and Applications, 2024, 60(1): 319-326. |
[10] | XIAO Yingchen, CHEN Gang. Study on Multi-Period Robust Tracking Shelter Hospital Location Based on Data-Driven [J]. Computer Engineering and Applications, 2023, 59(4): 280-289. |
[11] | SUN Hanyu, HUANG Lixia, ZHANG Xueying, LI Juan. Speech Emotion Recognition Based on Dual-Channel Convolutional Gated Recurrent Network [J]. Computer Engineering and Applications, 2023, 59(2): 170-177. |
[12] | QIAN Jiaming, LOU Wenqi, GONG Lei, WANG Chao, ZHOU Xuehai. Algorithm Compression and Hardware Design Co-Optimization for 3D-CNN [J]. Computer Engineering and Applications, 2023, 59(18): 74-83. |
[13] | LIANG He, LI Xin, YIN Nannan, LI Chao. APT Attack Detection Method Combining Dynamic Behavior and Static Characteristics [J]. Computer Engineering and Applications, 2023, 59(18): 249-259. |
[14] | QIU Yawen, ZOU Jixin, ZHU Ziqi. Multi-Level Semantic Interaction Model for Printer Source Identification [J]. Computer Engineering and Applications, 2023, 59(16): 101-107. |
[15] | GUO Ying, LIANG Ruilin, WANG Runmin. Cross-Domain Adaptive Object Detection Based on CNN Image Enhancement in Foggy Conditions [J]. Computer Engineering and Applications, 2023, 59(16): 187-195. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||