Research on Audio-Visual Dual-Modal Emotion Recognition Fusion Framework

doi:10.3778/j.issn.1002-8331.1811-0332

Abstract

Abstract:

Aiming at the problem of low recognition rate and poor reliability of dual-modal emotion recognition framework, the fusion of two most important modal speech and facial expression in dual-modal emotion recognition is studied. Feature extraction method based on prior knowledge and VGGNet-19 network are used to extract features of pre-processed audio and video signals respectively. Feature fusion is achieved by direct cascade and dimensionality reduction through PCA. BLSTM network is used to construct model to complete emotion recognition. The framework is applied to AViD-Corpus and SEMAINE databases for testing, and is compared with the traditional framework of feature level fusion of emotional recognition and the framework based on VGGNet-19 or BLSTM. The experimental results show that the Root Mean Square Error（RMSE） of emotional recognition is reduced and the Pearson Correlation Coefficient（PCC） is improved, which verifies the effectiveness of the proposed method.

Key words: audio-visual, dual-modal, feature-level fusion, emotion recognition, BLSTM

摘要：

针对双模态情感识别框架识别率低、可靠性差的问题，对情感识别最重要的两个模态语音和面部表情进行了双模态情感识别特征层融合的研究。采用基于先验知识的特征提取方法和VGGNet-19网络分别对预处理后的音视频信号进行特征提取，以直接级联的方式并通过PCA进行降维来达到特征融合的目的，使用BLSTM网络进行模型构建以完成情感识别。将该框架应用到AViD-Corpus和SEMAINE数据库上进行测试，并和传统情感识别特征层融合框架以及基于VGGNet-19或BLSTM的框架进行了对比。实验结果表明，情感识别的均方根误差（RMSE）得到降低，皮尔逊相关系数（PCC）得到提高，验证了文中提出方法的有效性。

关键词: 音视频, 双模态, 特征层融合, 情感识别, BLSTM

SONG Guanjun, ZHANG Shudong, WEI Feigao. Research on Audio-Visual Dual-Modal Emotion Recognition Fusion Framework[J]. Computer Engineering and Applications, 2020, 56(6): 140-146.

宋冠军，张树东，卫飞高. 音视频双模态情感识别融合框架研究[J]. 计算机工程与应用, 2020, 56(6): 140-146.

[1]	CAI Dongli, ZHONG Qinghua, ZHU Yongsheng, LIAO Jinxiang, HAN Maizhi. EEG Emotion Recognition Using Convolutional Neural Network with 3D Input [J]. Computer Engineering and Applications, 2021, 57(5): 161-167.
[2]	WANG Chuanyu, LI Weixiang, CHEN Zhenhuan. Reserch of Multi-modal Emotion Recognition Based on Voice and Video Images [J]. Computer Engineering and Applications, 2021, 57(23): 163-170.
[3]	MI Zhenmei, ZHAO Hengbin, GAO Pan. Research on Dimensional Emotion Recognition Model Based on ConvLSTM Network [J]. Computer Engineering and Applications, 2021, 57(18): 289-296.
[4]	ZHAO Yi, GAO Shuping, HE Di. Eye Movement and Tracking Data Fusion Algorithm Based on Deep Learning [J]. Computer Engineering and Applications, 2021, 57(10): 211-217.
[5]	HU Zhangfang, LIU Pengfei, JIANG Qin, LUO Fei, WANG Mingli. EEG Emotion Recognition Based on 3DC-BGRU [J]. Computer Engineering and Applications, 2020, 56(20): 111-117.
[6]	HU Zhangfang, XU Xuan, FU Yaqin, XIA Zhiguang, MA Sudong. End to End Speech Recognition Based on ResNet-BLSTM [J]. Computer Engineering and Applications, 2020, 56(18): 124-130.
[7]	SUN Xiaohu, LI Hongjun. Overview of Speech Emotion Recognition [J]. Computer Engineering and Applications, 2020, 56(11): 1-9.
[8]	CHEN Jingxia, WANG Liyan, JIA Xiaoyun, ZHANG Pengwei. EEG-Based Emotion Recognition Using Deep Convolutional Neural Network [J]. Computer Engineering and Applications, 2019, 55(18): 103-110.
[9]	MIAO Yuqing1, ZOU Wei1, LIU Tonglai1, ZHOU Ming2, CAI Guoyong1. Speech Emotion Recognition Model Based on Parameter Transfer and Convolutional Recurrent Neural Network [J]. Computer Engineering and Applications, 2019, 55(10): 135-140.
[10]	CHEN Chuang1, RYAD Chellali1, XING Yin2. Research on speech emotion recognition based on improved GWO optimized SVM [J]. Computer Engineering and Applications, 2018, 54(16): 113-118.
[11]	ZHANG Xiaohua, HUANG Bo. Facial emotion recognition based on Bandlet and KW technology for mobile applications [J]. Computer Engineering and Applications, 2018, 54(10): 213-218.
[12]	JIANG Xiaoqing1, 2, XIA Kewen1, LIN Yongliang1, 3. Speech emotion recognition using secondary feature selection and kernel fusion [J]. Computer Engineering and Applications, 2017, 53(3): 7-11.
[13]	ZHANG Wei, ZHANG Xueying, SUN Ying. Fusion fuzzy cognitive map for speech emotion recognition [J]. Computer Engineering and Applications, 2017, 53(15): 14-17.
[14]	TANG Guichen1, FENG Yueqin1, LIANG Ruiyu1，2, BAO Yongqiang1, ZHAO Li2. Research on algorithm of spectral feature extraction for speech emotion recognition [J]. Computer Engineering and Applications, 2016, 52(21): 152-156.
[15]	SHEN Yan, XIAO Zhongzhe, LI Bingjie, ZHOU Xiaojin, ZHOU Qiang, TAO Zhi. Speech emotion recognition using GW-MFCC feature [J]. Computer Engineering and Applications, 2015, 51(10): 219-222.

Research on Audio-Visual Dual-Modal Emotion Recognition Fusion Framework

音视频双模态情感识别融合框架研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics