Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (10): 135-140.DOI: 10.3778/j.issn.1002-8331.1802-0089

Previous Articles     Next Articles

Speech Emotion Recognition Model Based on Parameter Transfer and Convolutional Recurrent Neural Network

MIAO Yuqing1, ZOU Wei1, LIU Tonglai1, ZHOU Ming2, CAI Guoyong1   

  1. 1.School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    2.Guilin Hivision Technology Co. Ltd., Guilin, Guangxi 541004, China
  • Online:2019-05-15 Published:2019-05-13


缪裕青1,邹  巍1,刘同来1,周  明2,蔡国永1   

  1. 1.桂林电子科技大学 计算机与信息安全学院,广西 桂林 541004
    2.桂林海威科技股份有限公司,广西 桂林 541004

Abstract: In the study of speech emotion recognition, most methods based on deep learning don’t model the time-frequency characteristics of speech. Moreover, the network model has long training time and the recognition accuracy is not high. The spectrogram is a special image with both time and frequency domains after the conversion of speech signals. In order to fully extract the emotional features of time-frequency domain of the spectrogram, this paper proposes a speech emotion recognition model based on parameter transfer and convolutional recurrent neural network. The proposed model uses the spectrogram as the input of network, introduces the AlexNet network model, and transfers its weighting parameters of pre-trained convolutional layer. The output feature maps of convolutional neural network is put into long short-term memory neural networks for training after being reconstructed. The experimental results show that the proposed method has faster speed of network training and higher accuracy of emotion recognition.

Key words: spectrogram, deep learning, parameter transfer, convolutional recurrent neural network, speech emotion recognition

摘要: 在语音情感识别研究中,已有基于深度学习的方法大多没有针对语音时频两域的特征进行建模,且存在网络模型训练时间长、识别准确性不高等问题。语谱图是语音信号转换后具有时频两域的特殊图像,为了充分提取语谱图时频两域的情感特征,提出了一种基于参数迁移和卷积循环神经网络的语音情感识别模型。该模型把语谱图作为网络的输入,引入AlexNet网络模型并迁移其预训练的卷积层权重参数,将卷积神经网络输出的特征图重构后输入LSTM(Long Short-Term Memory)网络进行训练。实验结果表明,所提方法加快了网络训练的速度,并提高了情感识别的准确率。

关键词: 语谱图, 深度学习, 参数迁移, 卷积循环神经网络, 语音情感识别