Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (6): 140-146.DOI: 10.3778/j.issn.1002-8331.1811-0332

Previous Articles     Next Articles

Research on Audio-Visual Dual-Modal Emotion Recognition Fusion Framework

SONG Guanjun, ZHANG Shudong, WEI Feigao   

  1. College of Information Engineering, Capital Normal University, Beijing 100048, China
  • Online:2020-03-15 Published:2020-03-13



  1. 首都师范大学 信息工程学院,北京 100048


Aiming at the problem of low recognition rate and poor reliability of dual-modal emotion recognition framework, the fusion of two most important modal speech and facial expression in dual-modal emotion recognition is studied. Feature extraction method based on prior knowledge and VGGNet-19 network are used to extract features of pre-processed audio and video signals respectively. Feature fusion is achieved by direct cascade and dimensionality reduction through PCA. BLSTM network is used to construct model to complete emotion recognition. The framework is applied to AViD-Corpus and SEMAINE databases for testing, and is compared with the traditional framework of feature level fusion of emotional recognition and the framework based on VGGNet-19 or BLSTM. The experimental results show that the Root Mean Square Error(RMSE) of emotional recognition is reduced and the Pearson Correlation Coefficient(PCC) is improved, which verifies the effectiveness of the proposed method.

Key words: audio-visual, dual-modal, feature-level fusion, emotion recognition, BLSTM



关键词: 音视频, 双模态, 特征层融合, 情感识别, BLSTM