计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (22): 136-143.DOI: 10.3778/j.issn.1002-8331.2207-0420

• 模式识别与人工智能 • 上一篇    下一篇

结合显著特征筛选和ViT的面部表情识别方法

封红旗,黄伟铠,张登辉   

  1. 1.常州大学 计算机与人工智能学院,江苏 常州 213100
    2.浙江树人大学 信息科技学院,杭州 310000
  • 出版日期:2023-11-15 发布日期:2023-11-15

Facial Expression Recognition with Distinguishing Feature Filtering and ViT

FENG Hongqi, HUANG Weikai, ZHANG Denghui   

  1. 1.School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu 213100, China
    2.College of Information Technology, Zhejiang Shuren University, Hangzhou 310000, China
  • Online:2023-11-15 Published:2023-11-15

摘要: 真实的人机交互场景中,人的动态行为(转头、行走等)以及不稳定的光源,会导致面部细节特征无法有效提取,从而降低面部表情识别的准确率。针对该问题,提出了一种结合显著特征筛选和视觉转化器(ViT)的优化模型。采用加权求和光照归一化方法对原图进行亮度平衡,并利用卷积神经网络提取面部特征;使用显著特征筛选模块聚合面部局部-全局上下文信息;应用多层Transformer编码器来加强面部特征之间的关联性;最后采用Softmax函数对面部表情结果进行预测。实验结果表明,该网络模型在RAF-DB、FERPlus和AffectNet数据集上取得了良好的性能。

关键词: 面部表情识别, 显著特征筛选, 视觉转化器, 多层Transformer编码器

Abstract: In the real human-computer interaction scene, human’s dynamic behaviors (turning, walking, etc.) and unstable light sources lead to the ineffective extraction of facial detail features, thereby reducing the accuracy of facial expression recognition. In view of the problem, an optimization model combining distinguishing feature filtering and vision transformer(ViT) is proposed. Weighted sum illumination normalization is used to balance the brightness of the original image, and convolutional neural network is used to extract facial features. Improved feature attention module algorithm is used to aggregate facial local-global context information. Multi-layer Transformer encoder is used to enhance the associations between features. Finally, Softmax function is used to predict the facial expression results. The results show that the network model achieves good performance on RAF-DB, FERPlus and AffectNet datasets.

Key words: facial expression recognition, distinguishing feature filtering, vision transformer, multi-layer transformer encoder