计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (16): 265-273.DOI: 10.3778/j.issn.1002-8331.2201-0229

• 图形图像处理 • 上一篇    下一篇

融入注意力和密集连接的轻量型人体姿态估计

邓辉,徐杨   

  1. 1.贵州大学 大数据与信息工程学院,贵阳 550025
    2.贵阳铝镁设计研究院有限公司,贵阳 550009
  • 出版日期:2022-08-15 发布日期:2022-08-15

Lightweight Human Pose Estimation Based on Attention and Dense Connection

DENG Hui, XU Yang   

  1. 1.College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
    2.Guiyang Aluminum-Magnesium Design and Research Institute Co. Ltd., Guiyang 550009, China
  • Online:2022-08-15 Published:2022-08-15

摘要: 目前多数人体姿态估计方法聚焦于提升预测结果的准确性,从而造成了网络参数量大和运算复杂度高等问题。为缓解该矛盾,在高分辨率网络的基础上提出一种融入注意力和密集连接方式的轻量型人体姿态估计网络。重新设计高分辨率网络中的瓶颈模块,从而降低部分网络运算复杂度;改进引入的注意力机制并结合密集连接方式构建了轻量型模块,将其替换高分辨率网络的基础模块,使网络保持一定准确性的同时大幅缩减模型参数量和运算复杂度;利用多分辨率特征和反卷积重新设计网络输出的特征融合方式,最大程度提升模型预测精度。在公开数据集MPII和COCO上的实验结果表明,相比较于高分辨率网络,所提网络模型参数量减少了71.5%,在MPII验证集上,运算复杂度缩小了35.8%,在COCO验证集上,运算复杂度缩小了35.2%,平均准确率提升了0.6个百分点,即网络能在保证检测精度的基础上有效降低网络复杂度。

关键词: 人体姿态估计, 高分辨率网络, 注意力, 密集连接, 轻量型

Abstract: At present, most human pose estimation methods focus on improving the accuracy of prediction results, which causes problems such as large network parameters and high computational complexity. To alleviate this contradiction, a lightweight human pose estimation network is proposed based on a high-resolution network that integrates attention and dense connections. Firstly, the bottleneck module in the high-resolution network is redesigned to reduce the computational complexity of part of the network. Secondly, the introduced attention mechanism is improved and a light-weight module is constructed in combination with the dense connection method, which replaces the basic module of the high-resolution network so that the network maintains a certain accuracy while greatly reducing the model parameters and computational complexity. Finally, the feature fusion method of the network output is redesigned by using multi-resolution features and deconvolution to maximize the model prediction accuracy. The experimental results on the public datasets MPII and COCO show that, compared with the high-resolution network, the parameters of the proposed network model are reduced by 71.5%. On the MPII validation set, the computational complexity is reduced by 35.8%. On the COCO validation set, the computational complexity is reduced by 35.2%, and the average accuracy is increased by 0.6 percentage points, that is, the network can effectively reduce the network complexity while ensuring detection accuracy.

Key words: human pose estimation, high-resolution network, attention, dense connection, lightweight