Three-Stream Head Pose Estimation Algorithm Based on Multi-Stage Feature Fusion

doi:10.3778/j.issn.1002-8331.2204-0069

Abstract

Abstract: Aiming at the problems of poor real-time performance and low recognition rate of existing head pose estimation algorithms in complex scenes, a three-stream head pose estimation algorithm based on multi-stage feature fusion is proposed. The algorithm has a multi-level output structure. Three different types of networks are used to extract features from the input image, and each branch has three stages. Each stage only needs to refine the features of previous stage. Feature map extracted at the same stage is generated by the feature fusion module, which effectively avoids the problem of feature loss. The feature extraction module selects the Ghost module as the feature extraction network, and uses model compression to reduce network parameters and computation while ensuring network accuracy. In order to extract more important and effective features, an efficient channel attention module ECA-Net is introduced to improve the accuracy of head pose estimation. Experimental results show that the proposed algorithm achieves excellent performance on both the AFLW2000 dataset and the BIWI dataset, with a model size of only 0.55?MB and a reduced MAE of 4.68 and 3.59 on the AFLW2000 and BIWI datasets respectively, compared to many current head pose estimation methods.

Key words: head pose estimation, GhostNet, efficient channel attention, feature extraction, feature fusion

摘要： 针对现有的头部姿态估计算法在复杂场景下实时性较差、识别率较低的问题，提出了一种多阶段特征融合的三支流头部姿态估计算法。该算法具有多级输出的结构，用三条不同类型的网络分别对输入图像进行特征提取，并且每条支流上都有三个阶段，每一阶段只需要细化前一阶段的特征，相同阶段提取出的特征图经过特征融合模块来生成特征映射，有效避免了特征丢失问题；特征提取模块选择Ghost模块作为特征提取网络，利用模型压缩，使之在保证网络精度的同时减少网络参数和计算量；为提取出重要性更强的有效特征，引入高效通道注意力模块ECA-Net，从而提升头部姿态估计的准确性。实验结果表明，所提算法在AFLW2000数据集和BIWI数据集上均取得优异的性能，对比当前诸多头部姿态估计方法，模型大小仅为0.55 MB，在AFLW2000和BIWI数据集上的MAE分别降低至4.68和3.59。

关键词: 头部姿态估计, GhostNet, 高效通道注意力, 特征提取, 特征融合

HAN Xue, ZHANG Hongying, LU Xiuwen, ZHANG Qi. Three-Stream Head Pose Estimation Algorithm Based on Multi-Stage Feature Fusion[J]. Computer Engineering and Applications, 2023, 59(17): 212-222.

韩雪, 张红英, 卢琇雯, 张奇. 多阶段特征融合的三支流头部姿态估计算法[J]. 计算机工程与应用, 2023, 59(17): 212-222.

References

[1] ALIOUA N，AMINE A，ROGOZAN A，et al.Driver head pose estimation using efficient descriptor fusion[J].EURASIP Journal on Image and Video Processing，2016（1）：1-14.
[2] MURPHY-CHUTORIAN E，TRIVEDI M M.Head pose estimation in computer vision：a survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2008，31（4）：607-626.
[3] CAO K，RONG Y，LI C，et al.Pose-robust face recognition via deep residual equivariant mapping[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：5187-5196.
[4] KAZEMI V，SULLIVAN J.One millisecond face alignment with an ensemble of regression trees[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：1867-1874.
[5] ZHU X，LEI Z，LIU X，et al.Face alignment across large poses：a 3D solution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：146-155.
[6] BULAT A，TZIMIROPOULOS G.How far are we from solving the 2D & 3D face alignment problem?（and a dataset of 230，000 3D facial landmarks）[C]//IEEE International Conference on Computer Vision，2017.
[7] KUMAR A，ALAVI A，CHELLAPPA R.KEPLER：simultaneous estimation of keypoints and 3D pose of unconstrained faces in a unified framework by learning efficient H-CNN regressors[J].Image and Vision Computing，2018：49-62.
[8] 夏军，裴东，王全州，等.融合Gabor特征的局部自适应三值微分模式的人脸识别[J].激光与光电子学进展，2016，53（11）：110-116.
XIA J，PEI D，WANG Q Z，et al.Face recognition based on local adaptive ternary derivative pattern coupled with Gabor feature[J].Laser & Optoelectronics Progress，2016，53（11）：110-116.
[9] AHN B，CHOI D G，PARK J，et al.Real-time head pose estimation using multi-task deep neural network[J].Robotics and Autonomous Systems，2018：1-12.
[10] RUIZ N，CHONG E，REHG J M.Fine-grained head pose estimation without keypoints[J].arXiv：1710.00925，2017.
[11] YANG T Y，CHEN Y T，LIN Y Y，et al.FSA-Net：learning fine-grained structure aggregation for head pose estimation from a single image[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020.
[12] YANG T Y，HUANG Y H，LIN Y Y，et al.SSR-Net：a compact soft stagewise regression network for age estimation[C]//Proceedings of IJCAI，2018.
[13] ZHOU Y，GREGSON J.WHENet：real-time fine-grained estimation for wide range head pose[J].arXiv：2005. 10353，2020.
[14] ZHANG H，WANG M，LIU Y，et al.FDN：feature decoupling network for head pose estimation[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2020：12789-12796.
[15] DOSOVITSKIY A，BEYER L，KOLESNIKOV A，et al.An image is worth 16x16 words：transformers for image recognition at scale[J].arXiv：2010.11929，2020.
[16] DHINGRA N.HeadPosr：end-to-end trainable head pose estimation using transformer encoders[C]//2021 16th IEEE International Conference on Automatic Face and Gesture Recognition（FG 2021），2021：1-8.
[17] DHINGRA N.LwPosr：lightweight efficient fine grained head pose estimation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision，2022：1495-1505.
[18] CHOLLET F.Xception：deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1251-1258.
[19] HAN K，WANG Y，TIAN Q，et al.GhostNet：more features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：1580-1589.
[20] WANG Q，WU B，ZHU P，et al.ECA-Net：efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020.
[21] ZHU X，LEI Z，YAN J，et al.High-fidelity pose and expression normalization for face recognition in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：787-796.
[22] FANELLI G，DANTONE M，GALL J，et al.Random forests for real time 3D face analysis[J].International Journal of Computer Vision，2013，101（3）：437-458.
[23] ZHANG K，ZHANG Z，LI Z，et al.Joint face detection and alignment using multitask cascaded convolutional networks[J].IEEE Signal Processing Letters，2016，23（10）：1499-1503.