结合显著特征筛选和ViT的面部表情识别方法

doi:10.3778/j.issn.1002-8331.2207-0420

摘要/Abstract

摘要： 真实的人机交互场景中，人的动态行为（转头、行走等）以及不稳定的光源，会导致面部细节特征无法有效提取，从而降低面部表情识别的准确率。针对该问题，提出了一种结合显著特征筛选和视觉转化器（ViT）的优化模型。采用加权求和光照归一化方法对原图进行亮度平衡，并利用卷积神经网络提取面部特征；使用显著特征筛选模块聚合面部局部-全局上下文信息；应用多层Transformer编码器来加强面部特征之间的关联性；最后采用Softmax函数对面部表情结果进行预测。实验结果表明，该网络模型在RAF-DB、FERPlus和AffectNet数据集上取得了良好的性能。

关键词: 面部表情识别, 显著特征筛选, 视觉转化器, 多层Transformer编码器

Abstract: In the real human-computer interaction scene, human’s dynamic behaviors （turning, walking, etc.） and unstable light sources lead to the ineffective extraction of facial detail features, thereby reducing the accuracy of facial expression recognition. In view of the problem, an optimization model combining distinguishing feature filtering and vision transformer（ViT） is proposed. Weighted sum illumination normalization is used to balance the brightness of the original image, and convolutional neural network is used to extract facial features. Improved feature attention module algorithm is used to aggregate facial local-global context information. Multi-layer Transformer encoder is used to enhance the associations between features. Finally, Softmax function is used to predict the facial expression results. The results show that the network model achieves good performance on RAF-DB, FERPlus and AffectNet datasets.

Key words: facial expression recognition, distinguishing feature filtering, vision transformer, multi-layer transformer encoder

封红旗, 黄伟铠, 张登辉. 结合显著特征筛选和ViT的面部表情识别方法[J]. 计算机工程与应用, 2023, 59(22): 136-143.

FENG Hongqi, HUANG Weikai, ZHANG Denghui. Facial Expression Recognition with Distinguishing Feature Filtering and ViT[J]. Computer Engineering and Applications, 2023, 59(22): 136-143.

参考文献

[1] CHARLES D.The expression of the emotions in man and animals[M].New York：Oxford University Press，2002.
[2] LI S，DENG W H.Deep facial expression recognition：a survey[J].IEEE Transactions on Affective Computing，2022，13（3）：1195-1215.
[3] 王磊.中国单身经济研究——内涵、特征、趋势与建议[J].晋阳学刊，2021（6）：93-101.
WANG L.An analysis on China bachelordom economy—the connotation，characteristics，development trend and suggestions[J].Academic Journal of Jinyang，2021（6）：93-101.
[4] 乔桂芳，侯守明，刘彦彦.基于改进卷积神经网络与支持向量机结合的面部表情识别算法[J].计算机应用，2022，42（4）：1253-1259.
QIAO G F，HOU S M，LIU Y Y.Facial expression recognition algorithm based on combination of improved convolutional neural network and support vector machine[J].Journal of Computer Applications，2022，42（4）：1253-1259.
[5] 李勇，林小竹，蒋梦莹.基于跨连接LeNet-5网络的面部表情识别[J].自动化学报，2018，44（1）：176-182.
LI Y，LIN X Z，JIANG M Y.Facial expression recognition with cross-connect LeNet-5 network[J].Acta Automatica Sinica，2018，44（1）：176-182.
[6] LI Y，ZENG J B，SHAN S G，et al.Occlusion aware facial expression recognition using CNN with attention mechanism[J].IEEE Transactions on Image Processing，2019，28（5）：2439-2450.
[7] WANG K，PENG X J，YANG J F，et al.Region attention networks for pose and occlusion robust facial expression recognition[J].IEEE Transactions on Image Processing，2020，29：4057-4069.
[8] WANG K，PENG X J，YANG J F，et al.Suppressing uncertainties for large-scale facial expression recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：6897-6906.
[9] AMIR H F，QI X J.Facial expression recognition in the wild via deep attentive center loss[C]//Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision，2021：2402-2411.
[10] MA F Y，SUN B，LI S T.Facial expression recognition with visual transformers and attentional selective fusion[J].IEEE Transactions on Affective Computing，2023，14（2）：1236-1248.
[11] IOANNIS A K，GEORGE T，GEORGIOS E，et al.3D-2D face recognition with pose and illumination normalization[J].Computer Vision and Image Understanding，2017（154）：137-151.
[12] VIRENDRA P V，SAHIL D.A novel non-linear modifier for adaptive illumination normalization for robust face recognition[J].Multimedia Tools and Applications，2020，79（17）：11503-11529.
[13] CHEN W L，MENG J E，WU S Q.Illumination compensation and normalization for robust face recognition using discrete cosine transform in logarithm domain[J].IEEE Transactions on Systems，Man，and Cybernetics：Part B（Cybernetics），2006，36（2）：458-466.
[14] KUO C M，LAI S H，SARKIS M.A compact deep learning model for robust facial expression recognition[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops，2018：2121-2129.
[15] WOO S，PARK J，LEE J Y，et al.CBAM：convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision，2018：3-19.
[16] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[17] DAI Y，GIESEKE F，OEHMCKE S，et al.Attentional feature fusion[C]//Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision，2021：3560-3569.
[18] ASHISH V，NOAM S，NIKI P，et al.Attention is all you need[C]//Advances in Neural Information Processing Systems 30，2017.
[19] ALEXEY D，LUCAS B，ALEXANDER K，et al.An image is worth 16x16 words：transformers for image recognition at scale[J].arXiv：2010.11929，2020.
[20] WANG W H，XIE E Z，LI X，et al.Pyramid vision transformer：a versatile backbone for dense prediction without convolutions[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision，2021：568-578.
[21] WU H P，XIAO B，CODELLA N，et al.CvT：introducing convolutions to vision transformers[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision，2021：22-31.
[22] PIZER S M，AMBURN E P，AUSTIN J D，et al.Adaptive histogram equalization and its variations[J].Computer Vision，Graphics，and Image Processing，1987，39（3）：355-368.
[23] LI S，DENG W H，DU J P.Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：2852-2861.
[24] BARSOUM E，ZHANG C，FERRER C C，et al.Training deep networks for facial expression recognition with crowd-sourced label distribution[C]//Proceedings of the 18th ACM International Conference on Multimodal Interaction，2016：279-283.
[25] MOLLAHOSSEINI A，HASANI B，MAHOOR M H.AffectNet：a database for facial expression，valence，and arousal computing in the wild[J].IEEE Transactions on Affective Computing，2017，10（1）：18-31.
[26] ZHANG K P，ZHANG Z P，LI Z F，et al.Joint face detection and alignment using multitask cascaded convolutional networks[J].IEEE Signal Processing Letters，2016，23（10）：1499-1503.