Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (3): 131-135.DOI: 10.3778/j.issn.1002-8331.1608-0332

Previous Articles     Next Articles

Speech synthesis using simplified LSTM

CHEN Zhousi, HU Wenxin   

  1. Computer Center, East China Normal University, Shanghai 200062, China
  • Online:2018-02-01 Published:2018-02-07



  1. 华东师范大学 计算中心,上海 200062

Abstract: Conventional parametric speech synthesis approach using hidden Markov model can hardly obtain significant improvement when trained with large scale data. As Long Short-Term Memory(LSTM) is designed to take full account of the long-term sequence features, it dynamically produces an output respecting on the input and its internal status, which brings more accuracy and smoothness in sequential prediction. However, its large computation is still tailorable. In this paper, LSTM is simplified by removing the forget gate and output gate, and then models the relationship between syllable and its cepstral on a Chinese speech data set. Both training and prediction time decrease by half while Mel cepstral distortion goes down from HMM’s 3.466 1 to 1.945 9.

Key words: parametric speech synthesis, neural network, Long Short-Term Memory(LSTM)

摘要: 在增大训练数据的情况下,使用传统的隐马尔科夫模型难以提升参数化语音合成预测质量。长短期记忆神经网络学习序列内的长程特征,在大规模并行数值计算下获得更准确的语音时长和更连贯的频谱模型,但同时也包含了可简化的计算。首先分析双向长短期记忆神经网络功能结构,接着移除遗忘门和输出门,最后对文本音素信息到倒频谱的映射关系建模。在普通话语料库上的对比实验证明,简化双向长短期记忆神经网络计算量减少一半,梅尔倒频率失真度由隐马尔科夫模型的3.466 1降低到1.945 9。

关键词: 参数化语音合成, 神经网络, 长短期记忆神经网络