Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (22): 197-208.DOI: 10.3778/j.issn.1002-8331.2401-0173

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Lightweight and Efficient Human Pose Estimation Fusing Transformer and Attention

WU Chengpeng, TAN Guangxing, CHEN Haifeng, LI Chunyu   

  1. College of Automation, Guangxi University of Science and Technology, Liuzhou, Guangxi 545616, China
  • Online:2024-11-15 Published:2024-11-14

融合Transformer和注意力的轻量高效人体姿态估计

吴程鹏,谭光兴,陈海峰,李春宇   

  1. 广西科技大学 自动化学院,广西 柳州 545616

Abstract: Aiming at the heavy computational cost and huge network scale problem of human posture estimation algorithms, lightweight efficient vision transformer for human posture estimation (LEViTPose) is proposed. Firstly, a lightweight preprocessing module LStem is designed by introducing deepwise separable convolution, channel shuffle and multi-scale convolution kernel parallel techniques. Then, a cascaded group spatial linear reduction attention (CGSLRA) is proposed, which uses feature grouping to divide multiple attention heads to improve memory efficiency, and uses intra-group feature dimension reduction to reduce computational redundancy. Finally, a lightweight feature recovery module (LFRM) is designed by introducing pointwise convolution and group transposed convolution. The experimental results show that the proposed method can improve the network performance and inference speed while reducing the network size and computational overhead compared to the baseline model. Compared with LiteHRNet-30 on the MPII and COCO validation sets, the average accuracy is improved by 2.6 and 3.4 percentage points, and the inference speed is increased by a factor of 1.

Key words: human pose estimation, lightweight network, attention mechanism, Transformer

摘要: 针对人体姿态估计算法的沉重计算成本和庞大网络规模问题,提出面向人体姿态估计的轻量级高效视觉变换器(lightweight efficient vision transformer for human posture estimation,LEViTPose)。引入深度可分离卷积、通道混洗和多尺度卷积核并行技术来设计轻量级预处理模块LStem;提出一种级联组空间线性退化注意力(cascaded group spatial linear reduction attention,CGSLRA),采用特征分组划分多个注意头的方式来提升内存存储效率,采用组内特征降维来降低计算冗余;通过引入逐点卷积和分组反卷积来设计轻量级特征还原模块(lightweight feature recovery module,LFRM)。实验结果表明,所提方法相比基线模型,可以在提升网络性能和推理速度的同时降低网络规模和计算开销。在MPII和COCO验证集上与LiteHRNet-30相比,平均准确率分别提高了2.6和3.4个百分点,推理速度提升了1倍。

关键词: 人体姿态估计, 轻量级网络, 注意力机制, Transformer