计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (23): 214-220.DOI: 10.3778/j.issn.1002-8331.2105-0422

• 模式识别与人工智能 • 上一篇    下一篇

基于全局注意力及金字塔卷积网络的表情识别

毛君宇,何廷年,郭艺,李爱斌   

  1. 西北师范大学 计算机科学与工程学院,兰州 730070
  • 出版日期:2022-12-01 发布日期:2022-12-01

Expression Recognition Based on Global Attention and Pyramidal Convolution Network

MAO Junyu, HE Tingnian, GUO Yi, LI Aibin   

  1. College of Computer Science & Engineering, Northwest Normal University, Lanzhou 730070, China
  • Online:2022-12-01 Published:2022-12-01

摘要: 近年来基于深度学习的人脸表情识别技术已取得很大进展,但对于表情特征的多尺度提取,以及在不受约束的现实场景中进行面部表情识别仍然是具有挑战性的工作。为解决此问题,提出一种金字塔卷积神经网络与注意力机制结合的表情识别方法。对于初始的一张人脸表情图像,将其按照区域采样裁剪成多张子图像,将原图像和子图像输入到金字塔卷积神经网络进行多尺度特征提取,将提取到的特征图输入到全局注意力模块,给每一张图像分配一个权重,从而得到有重要特征信息的图像,将子图像和原始图像的特征进行加权求和,得到新的含有注意力信息的全局特征,最终进行表情识别分类。在CK+、RAF-DB、AffectNet三个公开表情数据集上分别取得了98.46%、87.34%、60.45%的准确率,提高了表情的识别精度。

关键词: 表情识别, 金字塔卷积, 注意力机制, 残差网络

Abstract: In recent years, great progress has been made in facial expression recognition technology based on deep learning, but it is still a challenging work for multi-scale extraction of expression features and facial expression recognition in unconstrained real scenes. To solve this problem, an expression recognition method based on pyramid convolution neural network and attention mechanism is proposed. Firstly, an initial facial expression image is cut into multiple sub images according to regional sampling, and the original image and sub image are input into pyramid convolution neural network for multi-scale feature extraction, and then the extracted feature image is input to the global attention module to assign a weight to each image, so as to obtain the image with important feature information. Then, the features of the sub image and the original image are weighted and summed to obtain a new global feature containing attention information, and finally the expression recognition and classification is carried out. In CK+, RAF-DB and AffectNet three public expression databases, the accuracy rates are 98.46%, 87.34% and 60.45% respectively, which improves the accuracy of expression recognition.

Key words: expression recognition, pyramid convolution, attention mechanism, residual network