计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (18): 145-153.DOI: 10.3778/j.issn.1002-8331.2301-0042

• 模式识别与人工智能 • 上一篇    下一篇

融合权重自适应损失和注意力的人体姿态估计

江春灵,曾碧,姚壮泽,邓斌   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2023-09-15 发布日期:2023-09-15

Human Pose Estimation Fusing Weight Adaptive Loss and Attention

JIANG Chunling, ZENG Bi, YAO Zhuangze, DENG Bin   

  1. School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2023-09-15 Published:2023-09-15

摘要: 在自底向上人体姿态估计方法中存在前景和背景样本不平衡的问题,同时高分辨率网络在特征提取和特征融合时不能有效获得通道信息和空间位置信息。针对以上问题,提出以高分辨率网络(HigherHRNet)为基础融合权重自适应和注意力的自底向上人体姿态估计网络WA-HRNet(weight-adaptive fusing attention HRNet)。提出权重自适应损失函数,自适应调整不同区域的损失权重,使得HigherHRNet训练时更加关注人体关键点中心区域;同时为了获取丰富的全局信息进一步定位关键点区域,提出高效全局注意力,加强关键点中心区域的表征;引入热力图分布调制,提高热力图解码关键点位置的准确性。在CrowdPose数据集以及COCO2017数据集上的实验表明,与基线HigherHRNet相比,WA-HRNet在CrowdPose测试集上AP值提升了5.8个百分点,在COCO2017测试集上AP值提升了1.8个百分点达到了72.3%,优于其他自底向上人体姿态估计主流算法。

关键词: 人体姿态估计, 自底向上, 注意力, 高分辨率网络

Abstract: There is an imbalance between foreground and background samples in the bottom-up human pose estimation method. Meanwhile, the high-resolution network cannot effectively obtain channel information and spatial location information during feature extraction and feature fusion. To address these problems, this paper presents WA-HRNet(weight-adaptive fusing attention HRNet):a bottom-up human pose estimation network based on the high-resolution network (HigherHRNet). Firstly, a weight-adaptive loss function is proposed to adaptively adjust the loss weight of different regions, so that HigherHRNet pays more attention to the central region of human key points during training. At the same time, in order to obtain rich global information and further locate the keypoint area, efficient global attention is proposed to strengthen the representation of the central area of the keypoint. Finally, heatmap distribution modulation is introduced to improve the accuracy of decoding keypoint locations in the heatmap. Experiments conducted on the CrowdPose dataset as well as the COCO2017 dataset show that WA-HRNet improves its AP value by 5.8 percentage points on the CrowdPose test set and 1.8 percentage points on the COCO2017 test-dev set to 72.3% compared to the baseline HigherHRNet, outperforming other mainstream algorithms for bottom-up human pose estimation.

Key words: human pose estimation, bottom-up, attention, high-resolution network