计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (22): 182-192.DOI: 10.3778/j.issn.1002-8331.2303-0288

• 图形图像处理 • 上一篇    下一篇

改进高分辨率网络的多目标动物姿态估计研究

徐贵冬,徐杨,邓辉,莫寒   

  1. 1.贵州大学 大数据与信息工程学院,贵阳 550025
    2.贵阳铝镁设计研究院有限公司,贵阳 550009
  • 出版日期:2023-11-15 发布日期:2023-11-15

Research on Multi-Target Animal Pose Estimation Based on Improved High Resolution Network

XU Guidong, XU Yang, DENG Hui, MO Han   

  1. 1.College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
    2.Guiyang Aluminum-Magnesium Design and Research Institute Co., Ltd., Guiyang 550009, China
  • Online:2023-11-15 Published:2023-11-15

摘要: 在动物姿态估计任务中,多目标动物姿态估计的各类遮挡情况,会导致动物关键点的检测效果不佳。针对该问题,提出基于改进高分辨网络的多目标动物姿态估计网络PAENet。使用融合了自注意力机制的混合卷积ACmix,重新设计了高分辨率网络的瓶颈模块,以增强网络对大尺度特征的提取能力;提出了串联通道注意力机制和空间注意力机制的PSAsblock基础模块,对动物姿态的多尺度特征进行高效提取;重新设计网络输出的特征融合部分,以充分利用低分辨率分支的特征信息,通过加入反卷积模块进一步提升网络的热图回归预测准确率。在最新公开的大规模哺乳动物姿态估计基准数据集AP10K上进行实验,结果表明,PAENet相比当前用于动物姿态估计的高分辨率网络,平均精度mAP提升了2.4个百分点,中型物体检测准确率[APM]提升了3.6个百分点,有效增强了网络在多目标动物姿态估计中遮挡关键点特征的提取能力。

关键词: 多目标动物姿态估计, 高分辨率网络, 注意力机制, 多尺度特征

Abstract: In the task of animal pose estimation, various occlusion conditions of multi-target animal pose estimation will lead to poor detection effect of animal key points. To solve this problem, a multi-objective animal attitude estimation network PAENet based on improved high resolution network is proposed. Firstly, the bottleneck module of the high resolution network is redesigned by using hybrid convolutional ACmix, which integrates the self-attention mechanism to enhance the capability of extracting large-scale features. Then, a PSAsblock basic module of the series channel attention mechanism and spatial attention mechanism is proposed to extract the multi-scale features of animal posture efficiently. Finally, the feature fusion part of the network output is redesigned to make full use of the feature information of low resolution branches. At the same time, the prediction accuracy of heat map regression of the network is further improved by adding deconvolution module. Experiments are carried out on AP10K, a newly published benchmark dataset for large-scale mammal pose estimation. The results show that compared with the current high resolution network used for animal pose estimation, the average precision mAP of PAENet increases by 2.4 percentage points, and the accuracy [APM] of medium object detection increases by 3.6 percentage points. It effectively enhances the ability of the network to extract key occlusion features in the multi-target animal attitude estimation.

Key words: multi-objective animal pose estimation, high resolution network, attention mechanism, multi-scale features