简化LSTM的语音合成

doi:10.3778/j.issn.1002-8331.1608-0332

计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (3): 131-135.DOI: 10.3778/j.issn.1002-8331.1608-0332

简化LSTM的语音合成

陈宙斯，胡文心

华东师范大学计算中心，上海 200062

出版日期:2018-02-01 发布日期:2018-02-07

Speech synthesis using simplified LSTM

CHEN Zhousi, HU Wenxin

Computer Center, East China Normal University, Shanghai 200062, China

Online:2018-02-01 Published:2018-02-07

摘要/Abstract

摘要： 在增大训练数据的情况下，使用传统的隐马尔科夫模型难以提升参数化语音合成预测质量。长短期记忆神经网络学习序列内的长程特征，在大规模并行数值计算下获得更准确的语音时长和更连贯的频谱模型，但同时也包含了可简化的计算。首先分析双向长短期记忆神经网络功能结构，接着移除遗忘门和输出门，最后对文本音素信息到倒频谱的映射关系建模。在普通话语料库上的对比实验证明，简化双向长短期记忆神经网络计算量减少一半，梅尔倒频率失真度由隐马尔科夫模型的3.466 1降低到1.945 9。

关键词: 参数化语音合成, 神经网络, 长短期记忆神经网络

Abstract: Conventional parametric speech synthesis approach using hidden Markov model can hardly obtain significant improvement when trained with large scale data. As Long Short-Term Memory（LSTM） is designed to take full account of the long-term sequence features, it dynamically produces an output respecting on the input and its internal status, which brings more accuracy and smoothness in sequential prediction. However, its large computation is still tailorable. In this paper, LSTM is simplified by removing the forget gate and output gate, and then models the relationship between syllable and its cepstral on a Chinese speech data set. Both training and prediction time decrease by half while Mel cepstral distortion goes down from HMM’s 3.466 1 to 1.945 9.

Key words: parametric speech synthesis, neural network, Long Short-Term Memory（LSTM）

陈宙斯，胡文心. 简化LSTM的语音合成[J]. 计算机工程与应用, 2018, 54(3): 131-135.

CHEN Zhousi, HU Wenxin. Speech synthesis using simplified LSTM[J]. Computer Engineering and Applications, 2018, 54(3): 131-135.

[1]	许昊，张凯，田英杰，种法广，王子超. 深度神经网络图像描述综述[J]. 计算机工程与应用, 2021, 57(9): 9-22.
[2]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[3]	牟清萍，张莹，张东波，王新杰，杨知桥. 目标丢失判别机制的视觉跟踪算法及应用研究[J]. 计算机工程与应用, 2021, 57(9): 140-147.
[4]	包志强，邢瑜，吕少卿，黄琼丹. 改进YOLO V2的6D目标姿态估计算法[J]. 计算机工程与应用, 2021, 57(9): 148-153.
[5]	王林，柴江云. 深度神经网络在多场景车辆属性识别中的研究[J]. 计算机工程与应用, 2021, 57(9): 162-167.
[6]	赵志焱，杨华，胡志伟，宇海萍. 基于TACNN的玉露香梨叶虫害识别[J]. 计算机工程与应用, 2021, 57(9): 176-181.
[7]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.
[8]	张成，戴俊峰，熊闻心. 融合LeNet-5改进的扫描文档手写日期识别[J]. 计算机工程与应用, 2021, 57(9): 207-211.
[9]	麻哲旭，杨峰，乔旭. 铁路路基病害智能检测方法[J]. 计算机工程与应用, 2021, 57(9): 272-278.
[10]	蒋斌，钟瑞，张秋闻，张焕龙. 采用深度学习方法的非正面表情识别综述[J]. 计算机工程与应用, 2021, 57(8): 48-61.
[11]	李震霄，孙伟，刘明明，郑丽丽，陈劭颖. 交通监控场景中的车辆检测与跟踪算法研究[J]. 计算机工程与应用, 2021, 57(8): 103-111.
[12]	张越，黄友锐，刘鹏坤. 引入注意力机制的多分辨率人体姿态估计研究[J]. 计算机工程与应用, 2021, 57(8): 126-132.
[13]	翟正利，李鹏辉，冯舒. 图对抗攻击研究综述[J]. 计算机工程与应用, 2021, 57(7): 14-21.
[14]	祝钧桃，姚光乐，张葛祥，李军，杨强，王胜，叶绍泽. 深度神经网络的小样本学习综述[J]. 计算机工程与应用, 2021, 57(7): 22-33.
[15]	梁芳烜，杨锋，卢丽云，尹梦晓. 基于卷积神经网络的脑肿瘤分割方法综述[J]. 计算机工程与应用, 2021, 57(7): 34-43.

简化LSTM的语音合成

Speech synthesis using simplified LSTM

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics