注意力金字塔卷积残差网络的表情识别

doi:10.3778/j.issn.1002-8331.2104-0111

摘要/Abstract

摘要： 人脸表情是人类内心情绪最真实最直观的表达方式之一，不同的表情之间具有细微的类间差异信息。因此，提取表征能力较强的特征成为表情识别的关键问题。为提取较为高级的语义特征，在残差网络（ResNet）的基础上提出一种注意力金字塔卷积残差网络模型（APRNET50）。该模型融合金字塔卷积模块、通道注意力和空间注意力。首先用金字塔卷积提取图像的细节特征信息，然后对所提特征在通道和空间维度上分配权重，按权重大小定位显著区域，最后通过全连接层构建分类器对表情进行分类。以端到端的方式进行训练，使得所提网络模型更适合于精细的面部表情分类。实验结果表明，在FER2013和CK+数据集上识别准确率可以达到73.001%和94.949%，与现有的方法相比识别准确率分别提高了2.091个百分点和0.279个百分点，达到了具有相对竞争力的效果。

关键词: 残差网络, 金字塔卷积, 注意力机制, 表情识别, 特征提取

Abstract: Facial expression is one of the most authentic and intuitive ways of expressing human inner emotions, there are subtle inter-class differences between different expressions. Therefore, extracting features with strong representational ability has become a key issue in facial expression recognition. In order to extract more advanced semantic features, an attention pyramid convolutional residual network model（APRNTE50） based on residual network（ResNet） is proposed, which integrates the pyramid convolution module, channel attention and spatial attention. Firstly, use pyramid convolution to extract the detailed feature information of the image, then assign the weight of the proposed features in the channel and spatial dimension, and locate the salient regions according to weight, finally, use full connection layer to construct a classifier to classify facial expressions. The proposed network is more suitable for the detailed classification of facial expressions when trained with an end-to-end manner. The results show that the recognition accuracy can reach 73.001% and 94.949% on FER2013 and CK+ datasets, compared with the existing methods, the recognition accuracy is improved by 2.091 percentage points and 0.279 percentage points respectively, and achieve a relatively competitive effect.

Key words: residual network, pyramid convolution, attention mechanism, facial expression recognition, feature extraction

陈加敏, 徐杨. 注意力金字塔卷积残差网络的表情识别[J]. 计算机工程与应用, 2022, 58(22): 123-131.

CHEN Jiamin, XU Yang. Expression Recognition Based on Convolution Residual Network of Attention Pyramid[J]. Computer Engineering and Applications, 2022, 58(22): 123-131.

参考文献

[1] MEHRABIAN A，RUSSELL J A.An approach to environmental psychology[M].Cambridge：MIT Press，1980：222-253.
[2] EKMAN P E，FRIESEN W.Pictures of facial affect[M].Palo Alto：Consulting Psychologists Press，1976.
[3] GU S T，XU C，FENG B.Facial expression recognition based on global and local feature fusion with CNNs[C]//Proceedings of the 2019 IEEE International Conference on Signal Processing，Communications and Computing，2019：1-5.
[4] YAN Y F，LI C，LU Y Y，et al.Design and experiment of facial expression recognition method based on LBP and CNN[C]//Proceedings of the 2019 IEEE Conference on Industrial Electronics and Applications，2019：602-607.
[5] ZENG N Y，ZHANG H，SONG B Y，et al.Facial expression recognition via learning deep sparse autoencoders[J].Neurocomputing，2018，27（3）：643-649.
[6] 亢洁，李思禹，基于注意力机制的人脸表情识别迁移学习方法[J].计算机工程与设计，2021，42（3）：797-804.
KANG J，LI S Y.Transfer learning method for facial expression recognition based on attention mechanism[J].Computer Engineering and Design，2021，42（3）：797-804.
[7] 柳璇，唐颖军，黄淑英.结合多特征和跨通道加权的面部表情识别[J].小型微型计算机系统，2021，42（2）：399-404.
LIU X，TANG Y J，HUANG S Y.Facial expression recognition combined with multiple features and cross-channel weighting[J].Journal of Chinese Computer Systems，2021，42（2）：399-404.
[8] 王建霞，陈慧萍，李佳泽，等.基于多特征融合卷积神经网络的人脸表情识别[J].河北科技大学学报，2019，40（6）：540-547.
WANG J X，CHEN H P，LI J Z，et al.Facial expression recognition based on multi-feature fusion convolution network[J].Journal of Hebei University of Science and Technology，2019，40（6）：540-547.
[9] KRIZHEVSKY A，SUTSKEVER I，HINTON G.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM，2017，60（6）：84-90.
[10] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409.
1556，2014.
[11] BENGIO Y，SIMARD P，FRASCONI P.Learning long-term dependencies with gradient descent is difficult[J].IEEE Transactions on Neural Networks，1994，5（2）：157-166.
[12] GLOROT X，BENGIO Y.Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the 13th International Conference on Artificial Intelligence and Statistics，2010：249-256.
[13] SZEGEDY C，IOFFE S，VANHOUCKE V，et al.Inception-v4，Inception-ResNet and the impact of residual connections on learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence，2017：4278-4284.
[14] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[15] DUTA I C，LIU L，ZHU F，et al.Pyramidal convolution：rethinking convolutional neural networks for visual recognition[J].arXiv：2006.11538，2020.
[16] HU J，SHEN L，SUN G，et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42（8）：2011-2023.
[17] WOO S，PARK J，LEE J Y，et al.CBAM：convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision，2018：3-19.
[18] GOODFELLOW I J，ERHAN D，CARRIER P L，et al.Challenges in representation learning：a report on three machine learning contests[C]//Proceedings of the 20th International Conference on Neural Information Processing.Berlin：Springer，2013：117-124.
[19] LUCEY P，COHN J F，KANADE T，et al.The extended Cohn-Kanade dataset （CK+）：a complete dataset for action unit and emotion-specified expression[C]//Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops，2010：94-101.
[20] MIAO S，XU H，HAN Z，et al.Recognizing facial expressions using a shallow convolutional neural network[J].IEEE Access，2019，7：78000-78011.
[21] ZHOU J C，JIA X，SHEN L L，et al.Improved softmax loss for deep learning-based face and expression recognition[J].Cognitive Computation and Systems，2019，1（4）：97-102.
[22] GAN Y，CHEN J，YANG Z，et al.Multiple attention network for facial expression recognition[J].IEEE Access，2020，8：7383-7393.
[23] 徐琳琳，张树美，赵俊莉.构建并行卷积神经网络的表情识别算法[J].中国图象图形学报，2019，24（2）：227-236.
XU L L，ZHANG S M，ZHAO J L.Expression recognition algorithm for parallel convolutional neural networks[J].Journal of Image and Graphics，2019，24（2）：227-236.
[24] AGRAWAL A，MITTAL N.Using CNN for facial expression recognition：a study of the effects of kernel size and number of filters on accuracy[J].The Visual Computer，2020，36（2）：405-412.
[25] 梁华刚，雷毅雄.增强可分离卷积通道特征的表情识别研究[J].计算机工程与应用，2022，58（2）：184-192.
LIANG H G，LEI Y X.Expression recognition with separable convolution channel enhancement features[J].Computer Engineering and Applications，2022，58（2）：184-192.
[26] FEI Z，YANG E，LI D，et al.Combining deep neural network with traditional classifier to recognize facial expressions[C]//Proceedings of the 2019 25th International Conference on Automation and Computing.Piscataway：IEEE，2019：1-6.
[27] SUN X，XIA P P，ZHANG L，et al.A ROI-guided deep architecture for robust facial expressions recognition[J].Information Sciences，2020，522：35-48.
[28] JAIN D K，SHAMSOLMOALI P，SEHDEV P.Extended deep neural network for facial emotion recognition[J].Pattern Recogntion Letters，2019，120：69-74.