计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (19): 154-160.

• 模式识别与人工智能 • 上一篇    下一篇

维吾尔语情感语音韵律转换研究

杜楠楠,赵  晖   

  1. 新疆大学 信息科学与工程学院,乌鲁木齐 830046
  • 出版日期:2016-10-01 发布日期:2016-11-18

Rearch on prosodic hierarchy conversion for Uyghur emotional speech

DU Nannan, ZHAO Hui   

  1. College?of?Information?Sciences?and Engineering, Xinjiang?University, Urumqi 830046, China
  • Online:2016-10-01 Published:2016-11-18

摘要: 面向维吾尔语情感语音转换,提出一种韵律建模转换方法。该方法结合了维吾尔语韵律特点及语言特点,首次利用离散余弦变换(DCT)分别参数化维吾尔语音节和韵律短语的情感基频。采用高斯混合模型(GMM)训练中性-情感基频联合特征,同时合成中性语速情感语音和情感语速情感语音,主观评测结果显示情感语速更有助于表达情感效果。主客观实验结果显示转换方法可有效进行维吾尔语情感韵律转换,三种情感下,音节和韵律短语的结果均达到75%以上,韵律短语的转换效果要稍优于音节。

关键词: 基频, 情感语音转换, 离散余弦变换, 高斯混合模型, 音节, 韵律短语

Abstract: A prosody conversion method is proposed for transforming neutral speech to emotional speech of Uyghur. The method uses the Discrete Cosine Transform(DCT) to parameterize the emotion fundamental frequency of the Uyghur syllables and prosodic phrases for the first time, which combining the Uyghur prosodic features and language features. Using the Gaussian Mixture Model(GMM) to train the joint characteristics of the neutral and emotional frequency, and then synthesize emotional speech with neutral speed and emotional speed. The listening test results show that emotional speed is more helpful to express the emotional speech. The objective evaluation and the listening test results show that method can actualize Uyghur emotional prosody conversion effectively, the conversion results of syllables and prosodic phrases of three emotions achieve accuracy of more than 75% in listening test, and the prosodic phrases is better than that of syllables.

Key words: fundamental frequency, emotional speech conversion, Discrete Cosine Transform(DCT), Gaussian Mixture Model(GMM), syllable, prosodic phrase