Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (22): 197-208.DOI: 10.3778/j.issn.1002-8331.2401-0173
• Pattern Recognition and Artificial Intelligence • Previous Articles Next Articles
WU Chengpeng, TAN Guangxing, CHEN Haifeng, LI Chunyu
Online:
2024-11-15
Published:
2024-11-14
吴程鹏,谭光兴,陈海峰,李春宇
WU Chengpeng, TAN Guangxing, CHEN Haifeng, LI Chunyu. Lightweight and Efficient Human Pose Estimation Fusing Transformer and Attention[J]. Computer Engineering and Applications, 2024, 60(22): 197-208.
吴程鹏, 谭光兴, 陈海峰, 李春宇. 融合Transformer和注意力的轻量高效人体姿态估计[J]. 计算机工程与应用, 2024, 60(22): 197-208.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2401-0173
[1] DUAN H D, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 2969-2978. [2] KHAN M A, JAVED K, KHAN S A, et al. Human action recog- nition using fusion of multiview and deep features: an application to video surveillance[J]. Multimedia Tools and Applications, 2024, 83(5): 14885-14911. [3] FANG Z J, LóPEZ A M. Intention recognition of pedestrians and cyclists by 2D pose estimation[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(11): 4773-4783. [4] LU M Q, HU Y C, LU X B. Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals[J]. Applied Intelligence, 2020, 50: 1100-1111. [5] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 1653-1660. [6] XIAO B, WU H P, WEI Y H. Simple baselines for human pose estimation and tracking[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 466-481. [7] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5693-5703. [8] ZHANG W Q, FANG J M, WANG X G, et al. EfficientPose: efficient human pose estimation with neural architecture search[J]. Computational Visual Media, 2021, 7: 335-347. [9] ZHANG Z, TANG J, WU G S. Simple and lightweight human pose estimation[J]. arXiv:1911.10346, 2019. [10] YU C Q, XIAO B, GAO C X, et al. Lite-HRNet: a lightweight high-resolution network[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10440-10450. [11] LIU X Y, PENG H W, ZHENG N X, et al. EfficientViT: memory efficient vision transformer with cascaded group attention[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14420-14430. [12] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778. [13] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9. [14] WU C P, TAN G X, LI C Y. HEViTPose: high-efficiency vision transformer for human pose estimation[J]. arXiv:2311.13615, 2023. [15] MA N, ZHANG X, ZHENG H T, et al. ShuffleNet v2: practical guidelines for efficient CNN architecture design[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 116-131. [16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017. [17] WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 568-578. [18] ANDRILUKA M, PISHCHULIN L, GEHLER P, et al. 2D human pose estimation: new benchmark and state of the art analysis[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 3686-3693. [19] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision, Zurich, 2014: 740-755. [20] GENG Z G, SUN K, XIAO B, et al. Bottom-up human pose estimation via disentangled keypoint regression[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14676-14686. [21] ARANI E, GOWDA S, MUKHERJEE R, et al. A comprehensive study of real-time object detection networks across multiple domains: a survey[J]. arXiv:2208.10895, 2022. [22] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [23] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25, 2012. [24] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014. [25] CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7103-7112. [26] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[J]. arXiv:1602.07360, 2016. [27] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141. [28] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[J]. arXiv:1704.04861, 2017. [29] XIE S N, GIRSHICK R, DOLLáR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1492-1500. [30] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020. [31] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022. [32] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 10347-10357. [33] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1251-1258. [34] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125. [35] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848. [36] BA J L, KIROS J R, HINTON G E. Layer normalization[J]. arXiv:1607.06450, 2016. [37] NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, 2016: 483-499. [38] ZHANG H, WU C R, ZHANG Z G, et al. ResNeSt: split-attention networks[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 2736-2746. [39] SANDLER M, HOWARD A, ZHU M L, et al. MobileNetv2: inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520. [40] KINGMA D P, BA J. Adam: a method for stochastic optimization[J]. arXiv:1412.6980, 2014. [41] XU Y F, ZHANG J, ZHANG Q M, et al. ViTPose: simple vision transformer baselines for human pose estimation[C]//Advances in Neural Information Processing Systems 35, 2022: 38571-38584. [42] 高坤, 李汪根, 束阳, 等. 融入密集连接的多尺度轻量级人体姿态估计[J]. 计算机工程与应用, 2022, 58(24): 196-204. GAO K, LI W G, SHU Y, et al. Multi-scale lightweight human pose estimation with dense connections[J]. Computer Engineering and Applications, 2022, 58(24): 196-204. [43] SUN X, XIAO B, WEI F Y, et al. Integral human pose regression[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 529-545. |
[1] | LIAN Lu, TIAN Qichuan, TAN Run, ZHANG Xiaohang. Research Progress of Image Style Transfer Based on Neural Network [J]. Computer Engineering and Applications, 2024, 60(9): 30-47. |
[2] | SHI Tao, CUI Jie, LI Song. Algorithm for Real-Time Vehicle Detection from UAVs Based on Optimizing and Improving YOLOv8 [J]. Computer Engineering and Applications, 2024, 60(9): 79-89. |
[3] | DOU Zhi, GAO Haoran, LIU Guoqi, CHANG Baofang. Small Sample Steel Plate Defect Detection Algorithm of Lightweight YOLOv8 [J]. Computer Engineering and Applications, 2024, 60(9): 90-100. |
[4] | WANG Ru, LIU Daming, ZHANG Jian. Wear-YOLO:Research on Detection Methods of Safety Equipment for Power Personnel in Substations [J]. Computer Engineering and Applications, 2024, 60(9): 111-121. |
[5] | CAI Teng, CHEN Cifa, DONG Fangmin. Low-Light Object Detection Combining Transformer and Dynamic Feature Fusion [J]. Computer Engineering and Applications, 2024, 60(9): 135-141. |
[6] | YANG Wentao, LEI Yuqi, LI Xingyue, ZHENG Tiancheng. Chinese Long Text Classification Model Based on BERT Fused Chinese Input Methods and BLCG [J]. Computer Engineering and Applications, 2024, 60(9): 196-202. |
[7] | ZHANG Yangning, ZHU Jing, DONG Rui, YOU Zeshun, WANG Zhen. Discourse-Level Topic Segmentation Model with Multi-Level Information Enhanced Heterogeneous Graphs Network [J]. Computer Engineering and Applications, 2024, 60(9): 203-211. |
[8] | TAO Linjuan, HUA Gengxing, LI Bo. Aspect-Level Sentiment Analysis Based on Location-Enhanced Word Embeddings and GRU-CNN Model [J]. Computer Engineering and Applications, 2024, 60(9): 212-218. |
[9] | JIANG Jielin, ZHU Yongwei, XU Xiaolong, CUI Yan, ZHAO Yingnan. Chinese Short Text Classification with Hybrid Features and Multi-Head Attention [J]. Computer Engineering and Applications, 2024, 60(9): 237-243. |
[10] | LI Zhonghua, LIN Chujun, ZHU Hengliang, LIAO Shiyu, BAI Yunqi. Small Object Detection Based on Structure Perception and Global Context Information [J]. Computer Engineering and Applications, 2024, 60(9): 292-298. |
[11] | LIU Shipeng, NING Dejun, MA Jue. LSTformer Model for Photovoltaic Power Prediction [J]. Computer Engineering and Applications, 2024, 60(9): 317-325. |
[12] | GUO Jin, SONG Tingqiang, SUN Yuanyuan, GONG Chuanjiang, LIU Yalin, MA Xinglu, FAN Haisheng. Improved Deeplabv3+ Crop Classification Method Based on Double Attention Fusion [J]. Computer Engineering and Applications, 2024, 60(8): 110-120. |
[13] | CAO Ganggang, WANG Banghai, SONG Yu. Cross-Modal Re-Identification Light Weight Network Combined with Data Enhancement [J]. Computer Engineering and Applications, 2024, 60(8): 131-139. |
[14] | HAO Xiaofang, ZHANG Chaoqun, LI Xiaoxiang, WANG Darui. Joint Entity Relation Extraction Model Based on Interactive Attention [J]. Computer Engineering and Applications, 2024, 60(8): 156-164. |
[15] | ZOU Zhentao, LI Zeping. Improved YOLOv7 for UAV Image Object Detection [J]. Computer Engineering and Applications, 2024, 60(8): 173-181. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||