计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (22): 123-131.DOI: 10.3778/j.issn.1002-8331.2104-0111

• 模式识别与人工智能 • 上一篇    下一篇

注意力金字塔卷积残差网络的表情识别

陈加敏,徐杨   

  1. 1.贵州大学 大数据与信息工程学院,贵阳 550025
    2.贵阳铝镁设计研究院有限公司,贵阳 550009
  • 出版日期:2022-11-15 发布日期:2022-11-15

Expression Recognition Based on Convolution Residual Network of Attention Pyramid

CHEN Jiamin, XU Yang   

  1. 1.College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
    2.Guiyang Aluminum-Magnesium Design and Research Institute Co., Ltd., Guiyang 550009, China
  • Online:2022-11-15 Published:2022-11-15

摘要: 人脸表情是人类内心情绪最真实最直观的表达方式之一,不同的表情之间具有细微的类间差异信息。因此,提取表征能力较强的特征成为表情识别的关键问题。为提取较为高级的语义特征,在残差网络(ResNet)的基础上提出一种注意力金字塔卷积残差网络模型(APRNET50)。该模型融合金字塔卷积模块、通道注意力和空间注意力。首先用金字塔卷积提取图像的细节特征信息,然后对所提特征在通道和空间维度上分配权重,按权重大小定位显著区域,最后通过全连接层构建分类器对表情进行分类。以端到端的方式进行训练,使得所提网络模型更适合于精细的面部表情分类。实验结果表明,在FER2013和CK+数据集上识别准确率可以达到73.001%和94.949%,与现有的方法相比识别准确率分别提高了2.091个百分点和0.279个百分点,达到了具有相对竞争力的效果。

关键词: 残差网络, 金字塔卷积, 注意力机制, 表情识别, 特征提取

Abstract: Facial expression is one of the most authentic and intuitive ways of expressing human inner emotions, there are subtle inter-class differences between different expressions. Therefore, extracting features with strong representational ability has become a key issue in facial expression recognition. In order to extract more advanced semantic features, an attention pyramid convolutional residual network model(APRNTE50) based on residual network(ResNet) is proposed, which integrates the pyramid convolution module, channel attention and spatial attention. Firstly, use pyramid convolution to extract the detailed feature information of the image, then assign the weight of the proposed features in the channel and spatial dimension, and locate the salient regions according to weight, finally, use full connection layer to construct a classifier to classify facial expressions. The proposed network is more suitable for the detailed classification of facial expressions when trained with an end-to-end manner. The results show that the recognition accuracy can reach 73.001% and 94.949% on FER2013 and CK+ datasets, compared with the existing methods, the recognition accuracy is improved by 2.091 percentage points and 0.279 percentage points respectively, and achieve a relatively competitive effect.

Key words: residual network, pyramid convolution, attention mechanism, facial expression recognition, feature extraction