计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (17): 212-222.DOI: 10.3778/j.issn.1002-8331.2204-0069

• 图形图像处理 • 上一篇    下一篇

多阶段特征融合的三支流头部姿态估计算法

韩雪,张红英,卢琇雯,张奇   

  1. 1.西南科技大学 信息工程学院,四川 绵阳 621010
    2.西南科技大学 特殊环境机器人技术四川省重点实验室,四川 绵阳 621010
  • 出版日期:2023-09-01 发布日期:2023-09-01

Three-Stream Head Pose Estimation Algorithm Based on Multi-Stage Feature Fusion

HAN Xue, ZHANG Hongying, LU Xiuwen, ZHANG Qi   

  1. 1.School of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
    2.Robot Technology Used for Special Environment Key Laboratory of Sichuan Provincial, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
  • Online:2023-09-01 Published:2023-09-01

摘要: 针对现有的头部姿态估计算法在复杂场景下实时性较差、识别率较低的问题,提出了一种多阶段特征融合的三支流头部姿态估计算法。该算法具有多级输出的结构,用三条不同类型的网络分别对输入图像进行特征提取,并且每条支流上都有三个阶段,每一阶段只需要细化前一阶段的特征,相同阶段提取出的特征图经过特征融合模块来生成特征映射,有效避免了特征丢失问题;特征提取模块选择Ghost模块作为特征提取网络,利用模型压缩,使之在保证网络精度的同时减少网络参数和计算量;为提取出重要性更强的有效特征,引入高效通道注意力模块ECA-Net,从而提升头部姿态估计的准确性。实验结果表明,所提算法在AFLW2000数据集和BIWI数据集上均取得优异的性能,对比当前诸多头部姿态估计方法,模型大小仅为0.55 MB,在AFLW2000和BIWI数据集上的MAE分别降低至4.68和3.59。

关键词: 头部姿态估计, GhostNet, 高效通道注意力, 特征提取, 特征融合

Abstract: Aiming at the problems of poor real-time performance and low recognition rate of existing head pose estimation algorithms in complex scenes, a three-stream head pose estimation algorithm based on multi-stage feature fusion is proposed. The algorithm has a multi-level output structure. Three different types of networks are used to extract features from the input image, and each branch has three stages. Each stage only needs to refine the features of previous stage. Feature map extracted at the same stage is generated by the feature fusion module, which effectively avoids the problem of feature loss. The feature extraction module selects the Ghost module as the feature extraction network, and uses model compression to reduce network parameters and computation while ensuring network accuracy. In order to extract more important and effective features, an efficient channel attention module ECA-Net is introduced to improve the accuracy of head pose estimation. Experimental results show that the proposed algorithm achieves excellent performance on both the AFLW2000 dataset and the BIWI dataset, with a model size of only 0.55?MB and a reduced MAE of 4.68 and 3.59 on the AFLW2000 and BIWI datasets respectively, compared to many current head pose estimation methods.

Key words: head pose estimation, GhostNet, efficient channel attention, feature extraction, feature fusion