计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (15): 251-257.DOI: 10.3778/j.issn.1002-8331.2404-0349

• 模式识别与人工智能 • 上一篇    下一篇

基于双流特征交叉融合Efficient Transformer的人脸表情识别

党宏社,孟饶辰,高宛蓉   

  1. 陕西科技大学 电气与控制工程学院, 西安 710021
  • 出版日期:2025-08-01 发布日期:2025-07-31

Facial Expression Recognition Based on Dual-Stream Feature Cross-Fusion Efficient Transformer

DANG Hongshe, MENG Raochen, GAO Wanrong   

  1. School of Electrical and Control Engineering, Shaanxi University of Science & Technology, Xi’an 710021, China
  • Online:2025-08-01 Published:2025-07-31

摘要: 面部表情识别在人机交互等现实应用中得到了越来越多的重视。为解决传统方法中由于类间相似性和类内差异引起的识别准确率低等问题,提出了一种双流特征交叉融合Efficient Transformer识别人脸表情的方法。使用IResNet50和MobileFaceNet分别提取人脸表情的图像和关键点的多尺度特征,同时采用通道注意力机制来增强关键特征并减少参数量;引入了交叉融合高效多头自注意力机制(cross fusion efficient multi-head self-attention,CFEMSA),对相同尺度的双流特征进行交叉融合,以突出面部显著特征;最后采用特征金字塔结构对不同尺度的交叉融合结果进行多尺度融合,以提高识别的准确性。提出的方法在RAF-DB、AffecNet-7和AffecNet-8数据集上的识别准确率分别为91.82%、67.46%和63.65%,实验结果证明该方法有效缓解了类间相似性和类内差异所引起的识别准确率低的问题。

关键词: 面部表情识别, Efficient Transformer, 交叉融合, 多尺度特征, 特征融合

Abstract: Facial expression recognition receives increasing attention in various practical applications such as human-computer interaction. To address the issues of low recognition accuracy caused by inter-class similarity and intra-class discrepancy in traditional methods, a dual-stream feature cross-fusion Efficient Transformer method for facial expression recognition is proposed. IResNet50 and MobileFaceNet are used to extract multi-scale image features and facial landmark features of facial expressions, while utilizing a channel attention mechanism to enhance key features and reduce the number of parameters. The cross fusion efficient multi-head self-attention (CFEMSA) mechanism is used to cross-fuse dual-stream features of the same scale to highlight salient facial features. Finally, a feature pyramid structure is used to perform multi-scale fusion of the cross-fusion results at different scales to enhance recognition accuracy. The proposed method achieves recognition accuracies of 91.82%, 67.46%, and 63.65% on the RAF-DB, AffecNet-7, and AffecNet-8 datasets, respectively. Experimental results show that the method effectively alleviates the problem of low recognition accuracy caused by inter-class similarity and intra-class discrepancy.

Key words: facial expression recognition, Efficient Transformer, cross fusion,  , multiscale features, feature fusion