Lightweight Facial Expression Recognition with Spatial Group-Wise Enhance

doi:10.3778/j.issn.1002-8331.2206-0268

Abstract

Abstract: Due to the unique complexity and subtlety of facial expressions, the high-precision recognition of facial expressions is a difficult problem. Aiming at the problems of insufficient feature extraction and generalization ability of lightweight network for facial expression in complex environment, a lightweight facial expression recognition method based on spatial group-wise enhance is proposed. Firstly, a parallel depthwise convolution residual module is designed on the shallow network to enhance the representation ability of local details of facial expressions and integrate with global features. Secondly, a spatial group-wise enhance is established in the deep network to improve the stability of the distribution of facial features and enhance the ability of the model to discriminate subtle changes in facial expressions. Finally, in order to avoid model overfitting, the output structure of the backbone network is improved without greatly increasing the computational complexity. The accuracy of this method in public seven classification dataset RAF-DB, AffectNet-7 and eight classification dataset AffectNet-8 is 88.33%, 63.09% and 60.12%, respectively. Experimental results show that the proposed method not only reduces network parameters, but also improves the accuracy of facial expression recognition, which proves the effectiveness of the proposed method and has a certain application prospect.

Key words: facial expression recognition, depthwise separable convolution, regional feature fusion, spatial group-wise enhance, lightweight

摘要： 由于人脸表情特有的复杂性与微妙性，对表情进行高精度识别是一个困难问题。针对轻量级网络在自然环境下对面部表情的特征提取不够充分、泛化能力不足等问题，提出了一种基于空间分组增强注意力的轻量级人脸表情识别方法。在浅层网络设计了并行的深度卷积残差结构，以增强模型对面部表情局部细节的表征能力，并与全局整体特征相融合。在深层网络建立了空间分组增强注意力机制，以提高表情特征分布的稳定性，并强化模型对表情细微变化的判别能力。为了避免模型过拟合，在不大量增加计算复杂度的前提下，对主干网络输出结构进行改进。该方法在公开的七分类数据集RAF-DB、AffectNet-7以及八分类数据集AffectNet-8上的表情识别准确率分别达到了88.33%、63.09%和60.12%，实验结果表明，所提方法在降低网络参数的同时，提高了表情识别准确率，证明了该方法的有效性，具有一定的应用前景。

关键词: 人脸表情识别, 深度可分离卷积, 区域特征融合, 空间分组增强注意力, 轻量化

LIU Jin, LUO Xiaoshu, XU Zhaoxing. Lightweight Facial Expression Recognition with Spatial Group-Wise Enhance[J]. Computer Engineering and Applications, 2023, 59(22): 233-241.

刘劲, 罗晓曙, 徐照兴. 空间分组增强注意力的轻量级人脸表情识别[J]. 计算机工程与应用, 2023, 59(22): 233-241.

References

[1] YU M，GUO Z，YU Y，et al.Spatiotemporal feature descriptor for micro-expression recognition using local cube binary pattern[J].IEEE Access，2019，7：214-225.
[2] EKMAN P，FRIESEN W V，HAGER J C.A technique for the measurement of facial action[J].Palo Alto，1978，47（2）：126-138.
[3] RIM D，HONARI S，HASAN M K，et al.Improving facial analysis and performance driven animation through disentangling identity and expression[J].Image and Vision Computing，2016，52：125-140.
[4] GAO Y，MA J，YUILLE A L.Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples[J].IEEE Transactions on Image Processing，2017，26（5）：2545-2560.
[5] 程学军，邢萧飞.利用改进型VGG标签学习的表情识别方法[J].计算机工程与设计，2022，43（4）：1134-1144.
CHENG X J，XING X F.Expression recognition method based on improved VGG tag learning[J].Computer Engineering and Design，2022，43（4）：1134-1144.
[6] LI X，HU X，YANG J.Spatial group-wise enhance：improving semantic feature learning in convolutional networks[J].arXiv：1905.09646，2019.
[7] LI S，DENG W.Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition[J].IEEE Transactions on Image Processing，2019，28（1）：356-370.
[8] MOLLAHOSSEINI A，HASANI B，MAHOOR M H.AffectNet：a database for facial expression，valence，and arousal computing in the wild[J].IEEE Transactions on Affective Computing，2017，10（1）：18-31.
[9] LIN M，CHEN Q，YAN S.Network in network[J].arXiv：1312.4400，2013.
[10] TAN M，LE Q.EfficientNetv2：smaller models and faster training[C]//Proceedings of the 38th International Conference on Machine Learning，2021：10096-10106.
[11] LIU Z，LIN Y，CAO Y，et al.Swin transformer：hierarchical vision transformer using shifted windows[J].arXiv：2103.14030，2021.
[12] ZHAO Z，LIU Q，ZHOU F.Robust lightweight facial expression recognition network with label distribution training[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence，2021：3510-3519.
[13] DENG J，GUO J，VERVERAS E，et al.RetinaFace：single-shot multi-level face localisation in the wild[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020.
[14] LOSHCHILOV I，HUTTER F.SGDR：stochastic gradient descent with warm restarts[J].arXiv：1608.03983，2016.
[15] 唐宏，向俊玲，陈海涛，等.基于多区域融合轻量级人脸表情识别方法[J].激光与光电子学进展，2023，60（6）：0610006.
TANG H，XIANG J L，CHEN H T，et al.Lightweight facial expression recognition method based on multi-region fusion[J].Laser & Optoelectronics Progress，2023，60（6）：0610006.
[16] ZENG J，SHAN S，CHEN X.Facial expression recognition with inconsistently annotated datasets[C]//Proceedings of the 15th European Conference on Computer Vision，2018：222-237.
[17] LI Y，ZENG J，SHAN S，et al.Occlusion aware facial expression recognition using CNN with attention mechanism[J].IEEE Transactions on Image Processing，2019，28（5）：2439-2450.
[18] CHEN S，WANG J，CHEN Y，et al.Label distribution learning on auxiliary label space graphs for facial expression recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：13984-13993.
[19] LI Y，ZENG J，SHAN S，et al.Occlusion aware facial expression recognition using CNN with attention mechanism[J].IEEE Transactions on Image Processing，2018，28（5）：2439-2450.
[20] LI Y，LU Y，LI J，et al.Separate loss for basic and compound facial expression recognition in the wild[C]//Proceedings of the 2019 Asian Conference on Machine Learning，2019：897-911.
[21] WEN Z，LIN W，WANG T，et al.Distract your attention：multi-head cross attention network for facial expression recognition[J].arXiv：2109.07270，2021.
[22] WANG K，PENG X，YANG J，et al.Suppressing uncertainties for large-scale facial expression recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：6897-6906.
[23] FARZANEH A H，QI X.Discriminant distribution-agnostic loss for facial expression recognition in the wild[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops，2020：406-407.
[24] KOLLIAS D，CHENG S，VERVERAS E，et al.Deep neural network augmentation：generating faces for affect analysis[J].arXiv：1811.05027，2018.
[25] LI H，SUI M，ZHAO F，et al.MVT：mask vision transformer for facial expression recognition in the wild[J].arXiv：2106.04520，2021.
[26] WANG K，PENG X，YANG J，et al.Region attention networks for pose and occlusion robust facial expression recognition[J].IEEE Transactions on Image Processing，2020，29：4057-4069.
[27] LIU Y，PENG J，ZENG J，et al.Pose-adaptive hierarchical attention network for facial expression recognition[J].arXiv：1905.10059，2019.
[28] VO T H，LEE G S，YANG H J，et al.Pyramid with super resolution for in-the-wild facial expression recognition[J].IEEE Access，2020，8：131988-132001.
[29] SELVARAJU R R，COGSWELL M，DAS A，et al.Grad-CAM：visual explanations from deep networks via gradient-based localization[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision，2017：618-626.