计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (6): 199-203.

• 信号处理 • 上一篇    下一篇

面向藏语语音合成的语音基元自动标注方法

徐世鹏,杨鸿武,王海燕   

  1. 西北师范大学 物理与电子工程学院,兰州 730070
  • 出版日期:2015-03-15 发布日期:2015-03-13

Speech unit segmentation for Tibetan speech synthesis

XU Shipeng, YANG Hongwu, WANG Haiyan   

  1. College of Physics and Electronic Engineering, Northwest Normal University, Lanzhou 730070, China
  • Online:2015-03-15 Published:2015-03-13

摘要: 在基于隐Markov模型(Hidden Markov Model,HMM)的统计参数藏语语音合成中引入了DAEM(Deterministic Annealing EM)算法,对没有时间标注的藏语训练语音进行自动时间标注。以声母和韵母为合成基元,在声母和韵母的声学模型的训练过程中,利用DAEM算法确定HMM模型的嵌入式重估的最佳参数。训练好声学模型后,再利用强制对齐自动获得声母和韵母的时间标注。实验结果表明,该方法对声母和韵母的时间标注接近手工标注的结果。对合成的藏语语音进行主观评测表明,该方法合成的藏语语音和手工标注声、韵母时间的方法合成的藏语语音的音质接近。因此,利用该方法可以在不需要声、韵母的时间标注的情况下建立合成基元的声学模型。

关键词: 藏语语音合成, 确定性退火期望值最大化(DAEM)算法, 自动标注, 时间标注

Abstract: This paper introduces a Deterministic Annealing Expectation Maximum(DAEM) algorithm into the HMM-based Tibetan speech synthesis to label the time boundary of speech synthesis unit for non-labeled training speech corpus automatically. The initial and the final are used as the speech synthesis units. The DAEM algorithm is used for determining the optimal parameters of the embedded re-evaluation during the model training. The boundaries of speech synthesis units are obtained by a force alignment in acoustic model training of speech synthesis unit. Tests show that the unit boundary obtained by the proposed method is close to the manually labeled boundary. Subjective evaluation on quality of synthesized speech shows that the synthesized Tibetan speech is also similar to the synthesized speech with manually labeled speech corpus. Therefore, proposed method can be used for training acoustic modes of Tibetan speech synthesis with non-labeled training speech corpus.

Key words: Tibetan speech synthesis, Deterministic Annealing Expectation Maximum(DAEM) algorithm, automatically label, time label