基于参数迁移和卷积循环神经网络的语音情感识别

doi:10.3778/j.issn.1002-8331.1802-0089

计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (10): 135-140.DOI: 10.3778/j.issn.1002-8331.1802-0089

基于参数迁移和卷积循环神经网络的语音情感识别

缪裕青1，邹巍1，刘同来1，周明2，蔡国永1

1.桂林电子科技大学计算机与信息安全学院，广西桂林 541004
2.桂林海威科技股份有限公司，广西桂林 541004

出版日期:2019-05-15 发布日期:2019-05-13

Speech Emotion Recognition Model Based on Parameter Transfer and Convolutional Recurrent Neural Network

MIAO Yuqing1, ZOU Wei1, LIU Tonglai1, ZHOU Ming2, CAI Guoyong1

1.School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
2.Guilin Hivision Technology Co. Ltd., Guilin, Guangxi 541004, China

Online:2019-05-15 Published:2019-05-13

摘要/Abstract

摘要： 在语音情感识别研究中，已有基于深度学习的方法大多没有针对语音时频两域的特征进行建模，且存在网络模型训练时间长、识别准确性不高等问题。语谱图是语音信号转换后具有时频两域的特殊图像，为了充分提取语谱图时频两域的情感特征，提出了一种基于参数迁移和卷积循环神经网络的语音情感识别模型。该模型把语谱图作为网络的输入，引入AlexNet网络模型并迁移其预训练的卷积层权重参数，将卷积神经网络输出的特征图重构后输入LSTM（Long Short-Term Memory）网络进行训练。实验结果表明，所提方法加快了网络训练的速度，并提高了情感识别的准确率。

关键词: 语谱图, 深度学习, 参数迁移, 卷积循环神经网络, 语音情感识别

Abstract: In the study of speech emotion recognition, most methods based on deep learning don’t model the time-frequency characteristics of speech. Moreover, the network model has long training time and the recognition accuracy is not high. The spectrogram is a special image with both time and frequency domains after the conversion of speech signals. In order to fully extract the emotional features of time-frequency domain of the spectrogram, this paper proposes a speech emotion recognition model based on parameter transfer and convolutional recurrent neural network. The proposed model uses the spectrogram as the input of network, introduces the AlexNet network model, and transfers its weighting parameters of pre-trained convolutional layer. The output feature maps of convolutional neural network is put into long short-term memory neural networks for training after being reconstructed. The experimental results show that the proposed method has faster speed of network training and higher accuracy of emotion recognition.

Key words: spectrogram, deep learning, parameter transfer, convolutional recurrent neural network, speech emotion recognition

缪裕青1，邹巍1，刘同来1，周明2，蔡国永1. 基于参数迁移和卷积循环神经网络的语音情感识别[J]. 计算机工程与应用, 2019, 55(10): 135-140.

MIAO Yuqing1, ZOU Wei1, LIU Tonglai1, ZHOU Ming2, CAI Guoyong1. Speech Emotion Recognition Model Based on Parameter Transfer and Convolutional Recurrent Neural Network[J]. Computer Engineering and Applications, 2019, 55(10): 135-140.

[1]	黄冬宜，杨兵，吴子豪，匡佳一，颜泽明. 用于全市蜂窝流量预测的时空全连接卷积网络[J]. 计算机工程与应用, 2021, 57(9): 168-175.
[2]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.
[3]	张成，戴俊峰，熊闻心. 融合LeNet-5改进的扫描文档手写日期识别[J]. 计算机工程与应用, 2021, 57(9): 207-211.
[4]	武文杰，宋文爱，高雪梅，杨吉江，王青，黄丽萍，雷毅. 基于X线的成人OSA计算机辅助诊断综述[J]. 计算机工程与应用, 2021, 57(9): 1-8.
[5]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[6]	李晓筱，胡晓光，王梓强，杜卓群. 基于深度学习的实例分割研究进展[J]. 计算机工程与应用, 2021, 57(9): 60-67.
[7]	李明山，韩清鹏，张天宇，王道累. 改进SSD的安全帽检测方法[J]. 计算机工程与应用, 2021, 57(8): 192-197.
[8]	曾春艳，严康，王志锋，余琰，纪纯妹. 深度学习模型可解释性研究综述[J]. 计算机工程与应用, 2021, 57(8): 1-9.
[9]	许德刚，王露，李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25.
[10]	蒋斌，钟瑞，张秋闻，张焕龙. 采用深度学习方法的非正面表情识别综述[J]. 计算机工程与应用, 2021, 57(8): 48-61.
[11]	赵圆丽，梁志剑. 基于异核卷积双注意机制的立场检测研究[J]. 计算机工程与应用, 2021, 57(8): 119-125.
[12]	李健，孙大松，张备伟. 结合双编码器与对抗训练的图像修复[J]. 计算机工程与应用, 2021, 57(7): 192-197.
[13]	杨波，陶青川，董沛君. 改进Deeplab v3+网络的手术器械分割方法[J]. 计算机工程与应用, 2021, 57(7): 222-227.
[14]	刘迪，贾金露，赵玉卿，钱育蓉. 基于深度学习的图像去噪方法研究综述[J]. 计算机工程与应用, 2021, 57(7): 1-13.
[15]	杨培伟，周余红，邢岗，田智强，许夏瑜. 卷积神经网络在生物医学图像上的应用进展[J]. 计算机工程与应用, 2021, 57(7): 44-58.

基于参数迁移和卷积循环神经网络的语音情感识别

Speech Emotion Recognition Model Based on Parameter Transfer and Convolutional Recurrent Neural Network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics