
Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (5): 187-199.DOI: 10.3778/j.issn.1002-8331.2406-0076
• Graphics and Image Processing • Previous Articles Next Articles
LI Xin, ZHANG Dan, GUO Xin, WANG Song, CHEN Enqing
Online:2025-03-01
Published:2025-03-01
李鑫,张丹,郭新,汪松,陈恩庆
LI Xin, ZHANG Dan, GUO Xin, WANG Song, CHEN Enqing. Human Pose Estimation Based on Dual-Stream Fusion of CNN and Transformer[J]. Computer Engineering and Applications, 2025, 61(5): 187-199.
李鑫, 张丹, 郭新, 汪松, 陈恩庆. 基于CNN和Transformer双流融合的人体姿态估计[J]. 计算机工程与应用, 2025, 61(5): 187-199.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2406-0076
| [1] MARCOS-RAMIRO A, PIZARRO D, MARRON-ROMERA M, et al. Let your body speak: communicative cue extraction on natural interaction using RGBD data[J]. IEEE Transactions on Multimedia, 2015, 17(10): 1721-1732. [2] ELKHOLY A, HUSSEIN M E, GOMAA W, et al. Efficient and robust skeleton-based quality assessment and abnormality detection in human action performance[J]. IEEE Journal of Biomedical and Health Informatics, 2019, 24(1): 280-291. [3] 甄昊宇, 张德. 结合自适应图卷积与时态建模的骨架动作识别[J]. 计算机工程与应用, 2023, 59(18): 137-144. ZHEN H Y, ZHANG D. Combining adaptive graph convolution and temporal modeling for skeleton-based action recognition[J]. Computer Engineering and Applications, 2023, 59(18): 137-144. [4] ANDRILUKA M, IQBAL U, INSAFUTDINOV E, et al. Posetrack: a benchmark for human pose estimation and tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 5167-5176. [5] 李博. 改进型深度迁移学习的跨镜行人追踪算法[J]. 计算机工程与应用, 2021, 57(10): 110-116. LI B. Improved deep transfer learning algorithm for person re-identification[J]. Computer Engineering and Applications, 2021, 57(10): 110-116. [6] 马金林, 崔琦磊, 马自萍, 等. 预加权调制密集图卷积网络三维人体姿态估计[J]. 计算机科学与探索, 2024, 18(4): 963-977. MA J L, CUI Q L, MA Z P, et al. Pre-weighted modulated dense graph convolutional networks for 3D human pose estimation[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 963-977. [7] 王仕宸, 黄凯, 陈志刚, 等. 深度学习的三维人体姿态估计综述[J]. 计算机科学与探索, 2023, 17(1): 74-87. WANG S C, HUANG K, CHEN Z G, et al. Survey on 3D human pose estimation of deep learning[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(1): 74-87. [8] 杨旭升, 吴江宇, 胡佛, 等. 基于渐进高斯滤波融合的多视角人体姿态估计[J]. 自动化学报, 2024, ?50(3): 607-616. YANG X S, WU J Y, HU F, et al. Multi-view human pose estimation based on progressive Gaussian filtering fusion[J]. Acta Automatica Sinica, 2024, ?50(3): 607-616. [9] ROGEZ G, RIHAN J, RAMALINGAM S, et al. Randomized trees for human pose detection[C]//Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008: 1-8. [10] URTASUN R, DARRELL T. Local probabilistic regression for activity-independent human pose inference[C]//Proceedings of the IEEE Conference on?Computer Vision and Pattern Recognition, 2008. [11] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 1653-1660. [12] PAPANDREOU G, ZHU T, KANAZAWA N, et al. Towards accurate multi-person pose estimation in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4903-4911. [13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000 - 6010. [14] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv:2010.11929, 2020. [15] XU Y, ZHANG J, ZHANG Q, et al. ViTPose: simple vision transformer baselines for human pose estimation[C]//Advances in Neural Information Processing Systems: 2022: 38571-38584. [16] MAO W, GE Y, SHEN C, et al. Poseur: direct human pose regression with transformers[C]//Proceedings of the European Conference on Computer Vision, 2022: 72-88. [17] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778. [18] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9. [19] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision, 2020: 213-229. [20] WU B, XU C, DAI X, et al. Visual transformers: token-based image representation and processing for computer vision[J]. arXiv:2006.03677, 2020. [21] 邓益侬, 罗健欣, 金凤林. 基于深度学习的人体姿态估计方法综述[J]. 计算机工程与应用, 2019, 55(19): 22-42. DENG Y N, LUO J X, JIN F L. Overview of human pose estimation methods based on deep learning[J]. Computer Engineering and Applications, 2019, 55(19): 22-42. [22] 周燕, 刘紫琴, 曾凡智, 等. 深度学习的二维人体姿态估计综述[J]. 计算机科学与探索, 2021, 15(4): 641-657. ZHOU Y, LIU Z Q, ZENG F Z, et al. Survey on two-dimensional human pose estimation of deep learning[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(4): 641-657. [23] ZHENG C, MENDIETA M, YANG T, et al. Feater: an efficient network for human reconstruction via feature map-based transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 13945-13954. [24] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 390-391. [25] NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimation[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 483-499. [26] CHEN Y, WANG Z, PENG Y, et al. Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7103-7112. [27] XIAO B, WU H, WEI Y. Simple baselines for human pose estimation and tracking[C]//Proceedings of the European Conference on Computer Vision, 2018: 466-481. [28] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5693-5703. [29] XIONG Z, WANG C, LI Y, et al. Swin-pose: swin transformer based human pose estimation[C]//Proceedings of the 2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval, 2022: 228-233. [30] LI K, WANG S, ZHANG X, et al. Pose recognition with cascade transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 1944-1953. [31] YUAN Y, FU R, HUANG L, et al. HRFormer: high-resolution vision transformer for dense predict[C]//Advances in Neural Information Processing Systems, 2021: 7281-7293. [32] YANG S, QUAN Z, NIE M, et al. TransPose: keypoint localization via transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 11802-11812. [33] LI Y, ZHANG S, WANG Z, et al. TokenPose: learning keypoint tokens for human pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 11313-11322. [34] MAO W, GE Y, SHEN C, et al. TFPose: direct human pose estimation with transformers[J]. arXiv:2103.15320, 2021. [35] 江春灵, 曾碧, 姚壮泽, 等. 融合权重自适应损失和注意力的人体姿态估计[J]. 计算机工程与应用, 2023, 59(18): 145-153. JIANG C L, ZENG B, YAO Z Z, et al. Human pose estimation fusing weight adaptive loss and attention[J]. Computer Engineering and Applications, 2023, 59(18): 145-153. [36] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141. [37] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision, 2018: 3-19. [38] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13713-13722. [39] XU W, WAN Y. ELA: efficient local attention for deep convolutional neural networks[J]. arXiv:2403.01123, 2024. [40] YOO J, KIM T, LEE S, et al. Enriched CNN-transformer feature aggregation networks for super-resolution[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023: 4956-4965. [41] WU Y, HE K. Group normalization[C]//Proceedings of the European Conference on Computer Vision, 2018: 3-19. [42] ZHANG F, ZHU X, DAI H, et al. Distribution-aware coordinate representation for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 7093-7102. [43] SUN X, ADAMU M J, ZHANG R, et al. Pixel-coordinate-induced human pose high-precision estimation method[J]. Electronics, 2023, 12(7): 1648. [44] 高坤, 李汪根, 束阳, 等. 融入密集连接的多尺度轻量级人体姿态估计[J]. 计算机工程与应用, 2022, 58(24): 196-204. GAO K, LI W G, SHU Y, et al. Multi-scale lightweight human pose estimation with dense connections[J]. Computer Engineering and Applications, 2022, 58(24): 196-204. [45] GENG Z, SUN K, XIAO B, et al. Bottom-up human pose estimation via disentangled keypoint regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14676-14686. [46] XU J, LIU W, XING W, et al. MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimation[J]. The Visual Computer, 2023, 39(5): 2005-2019. [47] DONG K, SUN Y, CHENG X, et al. Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation[J]. Applied Intelligence, 2023, 53(7): 8097-8113. [48] WANG W, XIE E, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 568-578. [49] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022. |
| [1] | GONG Xiaomei, ZHANG Yi, HU Shu. Target Tracking Algorithm with Feature Fusion and Transformer Based Model Predictor [J]. Computer Engineering and Applications, 2025, 61(6): 254-262. |
| [2] | DU Xiaogang, LU Wenjie, LEI Tao, WANG Yingbo. Low-Light Image Enhancement Using Brightness and Signal-to-Noise Ratio Guided Transformer [J]. Computer Engineering and Applications, 2025, 61(6): 263-272. |
| [3] | HUANG Shan, FAN Huijie, LIN Sen, CAO Jinghan, TANG Yandong. Feature Dynamic Library Based on Diffusion Method [J]. Computer Engineering and Applications, 2025, 61(5): 241-249. |
| [4] | JIN Jiali, YU Lu. Continual Image Captioning with Dynamic Token-Used Fusion Feature [J]. Computer Engineering and Applications, 2025, 61(4): 176-191. |
| [5] | WANG Weihang, ZHANG Yi. MLDAC:Multi-Task Dense Attention Computation Self-Supervised Few-Shot Semantic Segmentation Method [J]. Computer Engineering and Applications, 2025, 61(4): 211-221. |
| [6] | PAN Weilan, ZHANG Rongfen, LIU Yuhong, ZHANG Jiyou, SUN Long. Cross-Modal Transparent Object Segmentation Combining CNN-Transformer [J]. Computer Engineering and Applications, 2025, 61(4): 222-229. |
| [7] | YUAN Heng, YAN Tinghao, ZHANG Shengchong. Two-Stage Feature Transfer Image Dehazing Algorithm [J]. Computer Engineering and Applications, 2025, 61(4): 241-252. |
| [8] | FENG Xingyu, ZHU Linglong, ZHANG Yonghong, KAN Xi, CAO Haixiao, MA Guangyi. Change Detection Algorithm Based on Multilateral Feature Guided Aggregation Network [J]. Computer Engineering and Applications, 2025, 61(3): 264-274. |
| [9] | WEI Chao, QIAN Chunyu, HUANG Qipeng, DU Linxuan, YANG Zhe. Improved Model for Table-Line Detection Based on YOLOv8n [J]. Computer Engineering and Applications, 2025, 61(2): 112-123. |
| [10] | YANG Yuge, HAO Yangyang, WANG Yiwen. Sand Cat Swarm Optimization Algorithm Based on Weibull Flight and Warning Mechanism and Its Application [J]. Computer Engineering and Applications, 2025, 61(2): 145-157. |
| [11] | GAO Tengda, REN Zhaoting, SUN Tiejun, WU Chunlei, WANG Leiquan. Multi-Branch Weighted Transformer Hawkes Process [J]. Computer Engineering and Applications, 2025, 61(2): 191-199. |
| [12] | KANG Yu, HAO Xiaoli. Fine Grained Visual Classification Method for Combined Discriminative Region Features [J]. Computer Engineering and Applications, 2025, 61(2): 227-233. |
| [13] | HE Guang, WU Tianjun. Hyperspectral Image Classification Employing Spatial-Spectral Feature Supported by 3D Convolution and Transformer [J]. Computer Engineering and Applications, 2025, 61(2): 259-272. |
| [14] | LI Feixiang, JIANG Ailian. MSMVT: Semi-Supervised Framework with Multi-Scale and Multi-View Transformer for Medical Image Segmentation [J]. Computer Engineering and Applications, 2025, 61(2): 273-282. |
| [15] | JIANG Maoxiang, SI Zhanjun, WANG Xiaozhe. Improved Target Detection Algorithm for UAV Images with RT-DETR [J]. Computer Engineering and Applications, 2025, 61(1): 98-108. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||