计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (3): 131-135.DOI: 10.3778/j.issn.1002-8331.1608-0332
• 模式识别与人工智能 • 上一篇 下一篇
陈宙斯,胡文心
出版日期:
发布日期:
CHEN Zhousi, HU Wenxin
Online:
Published:
摘要: 在增大训练数据的情况下,使用传统的隐马尔科夫模型难以提升参数化语音合成预测质量。长短期记忆神经网络学习序列内的长程特征,在大规模并行数值计算下获得更准确的语音时长和更连贯的频谱模型,但同时也包含了可简化的计算。首先分析双向长短期记忆神经网络功能结构,接着移除遗忘门和输出门,最后对文本音素信息到倒频谱的映射关系建模。在普通话语料库上的对比实验证明,简化双向长短期记忆神经网络计算量减少一半,梅尔倒频率失真度由隐马尔科夫模型的3.466 1降低到1.945 9。
关键词: 参数化语音合成, 神经网络, 长短期记忆神经网络
Abstract: Conventional parametric speech synthesis approach using hidden Markov model can hardly obtain significant improvement when trained with large scale data. As Long Short-Term Memory(LSTM) is designed to take full account of the long-term sequence features, it dynamically produces an output respecting on the input and its internal status, which brings more accuracy and smoothness in sequential prediction. However, its large computation is still tailorable. In this paper, LSTM is simplified by removing the forget gate and output gate, and then models the relationship between syllable and its cepstral on a Chinese speech data set. Both training and prediction time decrease by half while Mel cepstral distortion goes down from HMM’s 3.466 1 to 1.945 9.
Key words: parametric speech synthesis, neural network, Long Short-Term Memory(LSTM)
陈宙斯,胡文心. 简化LSTM的语音合成[J]. 计算机工程与应用, 2018, 54(3): 131-135.
CHEN Zhousi, HU Wenxin. Speech synthesis using simplified LSTM[J]. Computer Engineering and Applications, 2018, 54(3): 131-135.
0 / 推荐
导出引用管理器 EndNote|Ris|BibTeX
链接本文: http://cea.ceaj.org/CN/10.3778/j.issn.1002-8331.1608-0332
http://cea.ceaj.org/CN/Y2018/V54/I3/131