Hybrid unit seletion speech synthesis system target cost construction

doi:10.3778/j.issn.1002-8331.1810-0354

Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (24): 20-25.DOI: 10.3778/j.issn.1002-8331.1810-0354

Previous Articles Next Articles

Hybrid unit seletion speech synthesis system target cost construction

CAI Wenbin1, WEI Yunlong1, XU Haihua2, PAN Lin1

1.College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
2.Temasek Laboratory, Nanyang Technological University, Singapore 639798, Singapore

Online:2018-12-15 Published:2018-12-14

混合单元选择语音合成系统的目标代价构建

蔡文彬1，魏云龙1，徐海华2，潘林1

1.福州大学物理与信息工程学院，福州 350108
2.南洋理工大学 Temasek实验室，新加坡 639798

Abstract

Abstract: A general method of guiding concatenate unit selection is minimized the sum of target and concatenation cost. Since candidate units involve complex linguistic and acoustic properties, the key of improving synthesized speech quality is how to choose the acoustic（or lingustic） features that accurately represent units characteristics and constructed the corresponding target cost. This paper explores target cost construction from two aspects：acoustic characteristics（Mel-generalized cepstral, log fundamental frequency, bottle-neck feature） and acoustic models. Experimental results show that the Deep Neural Network（DNN） based acoustic model which trained by a big similar cropus and fine tuned with current cropus can predict more robust bottle-neck features, it can employ those features to participate the target cost calculation then guide optimal candidate units selection, and impove the quality of synthesized speech.

Key words: speech synthesis, target cost, acoustic features, acoustic models, concatenate unit

摘要： 合成语音的基元是通过最小化目标代价和拼接代价来选取。由于拼接基元涉及复杂的语言学、声学特性，如何选择能准确描述基元信息的声学特征（或语言学特征）并构建相应目标代价是提高合成语音质量的关键。从声学特征和声学模型两个方面对目标代价构建进行了探究。实验结果表明，经过相似语料训练后微调的深度声学网络模型，预测的瓶颈特征更能表征拼接基元特性，从而指导目标代价筛选理想候选单元，提高合成语音的质量。

关键词: 语音合成, 目标代价, 声学特征, 声学模型, 拼接基元

CAI Wenbin1, WEI Yunlong1, XU Haihua2, PAN Lin1. Hybrid unit seletion speech synthesis system target cost construction[J]. Computer Engineering and Applications, 2018, 54(24): 20-25.

蔡文彬1，魏云龙1，徐海华2，潘林1. 混合单元选择语音合成系统的目标代价构建[J]. 计算机工程与应用, 2018, 54(24): 20-25.

[1]	ZHANG Xiaofeng, XIE Jun, LUO Jianxin, YANG Tao. Overview of Deep Learning Speech Synthesis Technology [J]. Computer Engineering and Applications, 2021, 57(9): 50-59.
[2]	CHEN Zhousi, HU Wenxin. Speech synthesis using simplified LSTM [J]. Computer Engineering and Applications, 2018, 54(3): 131-135.
[3]	GE Yongkan, YU Fengqin . Improved speech synthesis with adaptive postfilter parameters [J]. Computer Engineering and Applications, 2017, 53(1): 168-171.
[4]	HAO Dongliang, YANG Hongwu, ZHANG Ce, ZHANG Shuai, GUO Lizhao, YANG Jingbo. Label generation for Chinese statistical parametric speech synthesis [J]. Computer Engineering and Applications, 2016, 52(19): 146-153.
[5]	XU Shipeng, YANG Hongwu, WANG Haiyan. Speech unit segmentation for Tibetan speech synthesis [J]. Computer Engineering and Applications, 2015, 51(6): 199-203.
[6]	KUI Liping, YANG Jian, HU Enxing, HE Bin. Appliance of trainable speech synthesis to Vietnamese [J]. Computer Engineering and Applications, 2012, 48(35): 101-105.
[7]	Gulijiamali Maimaitiaili, Aisikaer Rouzi, Aisikaer Aimudula. Uighur speech synthesis method based on multi-level unit and prosodic parameter matching [J]. Computer Engineering and Applications, 2012, 48(2): 116-118.
[8]	Mamateli Tursun. Context dependent syllable based speech synthesis system for Uyghur [J]. Computer Engineering and Applications, 2011, 47(31): 141-143.
[9]	LI Jin-long，YANG Hong-wu，LIANG Qing-qing，PEI Dong，LIU Hui-juan. Lyrics to singing voice synthesis system [J]. Computer Engineering and Applications, 2010, 46(16): 124-126.
[10]	WANG Yong-sheng. Algorithm of grapheme-to-phoneme conversion in German speech synthesis system [J]. Computer Engineering and Applications, 2009, 45(35): 132-134.
[11]	YANG Shi-qiang，LIANG Ding-hong，FU Wei-ping. Design for speech remote control system of intelligent robot [J]. Computer Engineering and Applications, 2009, 45(25): 71-73.
[12]	Guljamal Mamateli,Askar Ruzi,Askar Hamdulla. Uyghur sentence selection algorithm of thriphone model [J]. Computer Engineering and Applications, 2009, 45(18): 242-244.
[13]	HE Pei-gang,JIANG Bao-chen. Speech spectral smoothing algorithm based on Fourier transform [J]. Computer Engineering and Applications, 2008, 44(3): 70-71.
[14]	WANG Yong-sheng¹,LI Mei². Homograph disambiguation algorithm using WordNet in English speech synthesis [J]. Computer Engineering and Applications, 2008, 44(26): 138-140.

Hybrid unit seletion speech synthesis system target cost construction

混合单元选择语音合成系统的目标代价构建

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 14

Recommended Articles

Metrics