计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (22): 242-250.DOI: 10.3778/j.issn.1002-8331.2206-0245

• 图形图像处理 • 上一篇    下一篇

多尺度坐标注意力金字塔卷积的面部表情识别

倪锦园,张建勋   

  1. 重庆理工大学 计算机科学与工程学院,重庆 400054
  • 出版日期:2023-11-15 发布日期:2023-11-15

Multi-Scale Coordinate Attention Pyramid Convolution for Facial Expression Recognition

NI Jinyuan, ZHANG Jianxun   

  1. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
  • Online:2023-11-15 Published:2023-11-15

摘要: 针对传统卷积神经网络对人脸面部表情特征提取能力不足、计算速度较慢等问题,提出了一种多尺度融合注意力的金字塔卷积模型。为了减少网络的参数量,提高网络的计算速度,增大模型的感受野,改进了金字塔卷积结构;为了从多尺度表示面部表情特征,提高模型对面部特征的表示能力,提出了SECA坐标注意力模块;为了节省网络的计算量,解决模型冗余的问题,促进通道间的信息融合,提出了深度可分离混洗方法。实验结果表明,该模型在公开数据集FER2013、CK+和JAFFE上的准确率分别为72.89%、98.55%和94.37%,参数量为1.958×107,与其他网络对比,该网络识别效果更好,准确率更高,同时保持较快的计算速度。

关键词: 金字塔卷积, 面部特征, 注意力, 深度可分离混洗

Abstract: To address the problems of insufficient extraction ability and slow computation speed of facial expression features by traditional convolutional neural networks, a pyramidal convolutional model with multi-scale fused attention is proposed in this paper. In order to reduce the number of parameters of the network, improve the computational speed of the network, and increase the perceptual field of the model, the pyramidal convolutional structure is improved. In order to represent facial expression features from multiple scales and improve the ability of the model to represent facial features, the SECA coordinate attention module is proposed. In order to save the computational power of the network, solve the problem of model redundancy, and promote the fusion of information between channels, the depth-separable blending method is proposed. The experimental results show that the accuracy of the model is 72.89%, 98.55% and 94.37% on the public datasets FER2013, CK+ and JAFFE, respectively, with the number of parameters of 1.958×107. In comparison with other networks, the proposed network has better recognition and higher accuracy, while maintaining a faster computational speed.

Key words: pyramidal convolution, facial features, attention, depth-separable shuffle