计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (18): 217-229.DOI: 10.3778/j.issn.1002-8331.2306-0392

• 图形图像处理 • 上一篇    下一篇

轻量高效的自底向上人体姿态估计算法研究

马赛,葛海波,何文昊,程梦洋,安玉   

  1. 西安邮电大学 电子工程学院,西安 710121
  • 出版日期:2024-09-15 发布日期:2024-09-13

Research on Lightweight and Efficient Bottom-Up Human Pose Estimation Algorithm

MA Sai, GE Haibo, HE Wenhao, CHENG Mengyang, AN Yu   

  1. School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
  • Online:2024-09-15 Published:2024-09-13

摘要: 针对人体姿态估计算法模型复杂和计算成本高的问题,提出了一种基于HigherHRNet的自底向上轻量高效的人体姿态估计网络(lightweight and efficient HigherHRNet,LE-HigherHRNet)。采用深度可分离卷积(depthwise separable convolutions),减少特征提取网络的参数数量;引入协调注意力机制(coordinate attention),更好地捕获位置信息和通道特征信息,突出图像中小目标和遮挡人体关键点的特征信息;通过平行连接实现多阶段分辨率的连接,增强特征信息的提取能力;在网络中采用跳跃链接并设计轻量级CARAFE上采样,保留和重建特征信息,增强高低分辨率之间的空间位置信息。实验结果表明,相比较HigherHRNet在小幅提升精度的同时,显著减少了模型参数量,降低了运算复杂度。

关键词: 人体姿态估计, 轻量级网络, 协调注意力机制, CARAFE上采样

Abstract: Aiming at the problems of complexity and high computational cost of human pose estimation algorithm model, a bottom-up lightweight and efficient human pose estimation network based on HigherHRNet (lightweight and efficient HigherHRNet, LE-HigherHRNet) is proposed. Depthwise separable convolutions are used to reduce the number of parameters of the feature extraction network. The coordinate attention mechanism is introduced to better capture position information and channel feature information, highlighting the feature information of small objects in the image and occluding key points of the human body. The proposed network achieves multi-stage resolution connection through parallel connection, which can enhance the ability to extract shallow feature information. This paper uses skip links in the network and designs lightweight CARAFE upsampling, retains and reconstructs feature information, and enhances spatial position information between high and low resolution. The experimental results show that, compared with HigherHRNet, while slightly improving the accuracy, it significantly reduces the number of model parameters and reduces the computational complexity.

Key words: human pose estimation, lightweight network, coordinate attention, CARAFE upsampling