计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (20): 182-193.DOI: 10.3778/j.issn.1002-8331.2411-0297

• 模式识别与人工智能 • 上一篇    下一篇

基于图神经网络的多域特征融合表情识别算法研究

张俊杰,费程,何伏刚   

  1. 1.中国人民公安大学 信息网络安全学院,北京 102623 
    2.首都师范大学 美术学院,北京 100089
    3.中国人民公安大学 警体战训学院,北京 100038
  • 出版日期:2025-10-15 发布日期:2025-10-15

Multi-Domain Feature Fusion for Facial Expression Recognition Based on Graph Neural Networks

ZHANG Junjie, FEI Cheng, HE Fugang   

  1. 1.School of Information Network Security, People’s Public Security University of China, Beijing 102623, China
    2.College of Fine Arts, Capital Normal University, Beijing 100089, China
    3.School of Police Law Enforcement Abilities Training, People’s Public Security University of China, Beijing 100038, China
  • Online:2025-10-15 Published:2025-10-15

摘要: 针对现有深度模型需要大量训练数据且无法基于同一网络架构同时对宏表情和微表情进行识别的问题,提出一种基于空域与谱域特征级融合的图神经网络表情识别算法S2GNN(spatial and spectral graph neural networks)。以图像像素点作为节点,节点间的距离作为边,将面部建模为无向加权图,利用邻接矩阵存储图中的距离信息。得到邻接矩阵后,一方面基于图傅里叶变换将图像由空域转化为谱域,在谱域对图像进行卷积,得到图像的谱域特征。另一方面基于无向加权图在空域对图中节点及其相邻节点进行卷积,得到图像的空域特征。利用提出的基于空域特征引导的特征选择模块实现空域和谱域特征的融合。基于融合后的特征进行表情识别。实验结果表明,将空域特征和谱域特征相结合可以提高表情识别的准确率,即使在少量训练数据情况下也可取得较高的分类准确率。在8个公开数据集上进行测试,平均准确率可达89.11%。宏表情和微表情混合数据情况下,未加权F1值和未加权平均召回率分别为0.954和0.970。对实验结果进一步分析可以发现,眼部区域对提高遮挡情况下表情识别的准确率具有重要作用。

关键词: 图神经网络, 表情识别, 微表情识别, 空域特征引导

Abstract: To solve the problem that it is difficult to recognize macro-expressions and micro-expressions at the same time with a small amount of training data using deep learning, a multi-domain feature fusion framework based on graph neural networks is proposed, named S2GNN(spatial and spectral graph neural networks). Firstly, pixels are represented by nodes and the distances between each node are represented by edges. The face can be built as an undirected weighted graph and the distances are saved as an adjacency matrix. On the one hand, images are transformed from spatial domain to spectral domain based on graph fourier transform, which can obtain spectral features. On the other hand, the node and its adjacent nodes are convoluted based on graph in spatial domain to obtain spatial domain features after adjacency matrix is obtained. And then, a spatial feature guided feature selection module is proposed to fuse the spatial and spectral domain features. Finally, results can be obtained based on the fused features. Results show that expressions recognition accuracy can be improved by combining spatial and spectral domain features and better performance can be obtained with a small amount of training data. An accuracy of 89.11% is reached based on eight open source datasets. In the case of mixed data with macro and micro expressions, the unweighted F1 score and unweighted average recall are 0.954 and 0.970, respectively. Furthermore, eye region plays an important role in improving facial expression recognition accuracy.

Key words: graph neural networks, facial expression recognition, micro expression recognition, spatial feature guided