[1] DUAN H D, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 2969-2978.
[2] KHAN M A, JAVED K, KHAN S A, et al. Human action recog-
nition using fusion of multiview and deep features: an application to video surveillance[J]. Multimedia Tools and Applications, 2024, 83(5): 14885-14911.
[3] FANG Z J, LóPEZ A M. Intention recognition of pedestrians and cyclists by 2D pose estimation[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(11): 4773-4783.
[4] LU M Q, HU Y C, LU X B. Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals[J]. Applied Intelligence, 2020, 50: 1100-1111.
[5] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 1653-1660.
[6] XIAO B, WU H P, WEI Y H. Simple baselines for human pose estimation and tracking[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 466-481.
[7] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5693-5703.
[8] ZHANG W Q, FANG J M, WANG X G, et al. EfficientPose: efficient human pose estimation with neural architecture search[J]. Computational Visual Media, 2021, 7: 335-347.
[9] ZHANG Z, TANG J, WU G S. Simple and lightweight human pose estimation[J]. arXiv:1911.10346, 2019.
[10] YU C Q, XIAO B, GAO C X, et al. Lite-HRNet: a lightweight high-resolution network[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10440-10450.
[11] LIU X Y, PENG H W, ZHENG N X, et al. EfficientViT: memory efficient vision transformer with cascaded group attention[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14420-14430.
[12] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[13] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9.
[14] WU C P, TAN G X, LI C Y. HEViTPose: high-efficiency vision transformer for human pose estimation[J]. arXiv:2311.13615, 2023.
[15] MA N, ZHANG X, ZHENG H T, et al. ShuffleNet v2: practical guidelines for efficient CNN architecture design[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 116-131.
[16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017.
[17] WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 568-578.
[18] ANDRILUKA M, PISHCHULIN L, GEHLER P, et al. 2D human pose estimation: new benchmark and state of the art analysis[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 3686-3693.
[19] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision, Zurich, 2014: 740-755.
[20] GENG Z G, SUN K, XIAO B, et al. Bottom-up human pose estimation via disentangled keypoint regression[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14676-14686.
[21] ARANI E, GOWDA S, MUKHERJEE R, et al. A comprehensive study of real-time object detection networks across multiple domains: a survey[J]. arXiv:2208.10895, 2022.
[22] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[23] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25, 2012.
[24] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.
1556, 2014.
[25] CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7103-7112.
[26] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[J]. arXiv:1602.07360, 2016.
[27] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[28] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[J]. arXiv:1704.04861, 2017.
[29] XIE S N, GIRSHICK R, DOLLáR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1492-1500.
[30] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[31] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[32] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 10347-10357.
[33] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1251-1258.
[34] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[35] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[36] BA J L, KIROS J R, HINTON G E. Layer normalization[J]. arXiv:1607.06450, 2016.
[37] NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, 2016: 483-499.
[38] ZHANG H, WU C R, ZHANG Z G, et al. ResNeSt: split-attention networks[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 2736-2746.
[39] SANDLER M, HOWARD A, ZHU M L, et al. MobileNetv2: inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
[40] KINGMA D P, BA J. Adam: a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
[41] XU Y F, ZHANG J, ZHANG Q M, et al. ViTPose: simple vision transformer baselines for human pose estimation[C]//Advances in Neural Information Processing Systems 35, 2022: 38571-38584.
[42] 高坤, 李汪根, 束阳, 等. 融入密集连接的多尺度轻量级人体姿态估计[J]. 计算机工程与应用, 2022, 58(24): 196-204.
GAO K, LI W G, SHU Y, et al. Multi-scale lightweight human pose estimation with dense connections[J]. Computer Engineering and Applications, 2022, 58(24): 196-204.
[43] SUN X, XIAO B, WEI F Y, et al. Integral human pose regression[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 529-545. |