Label generation for Chinese statistical parametric speech synthesis

Abstract

Abstract: This paper designs a six-level context-dependent label format, which includes an initial and final level, a syllable level, a word level, a prosodic word level, a prosody phrase level and a sentence level, for Chinese statistical parametric speech synthesis. The input Chinese sentence is firstly normalized and performs grammar analysis to obtain sentence structure and word segmentation information. Then the initial, final and tone of Chinese character are obtained by grapheme-to-phoneme conversion. The Transformation-Based error driven Learning（TBL） algorithm is finally employed to predict the prosodic word boundary and prosodic phrase boundary of the input sentence. Context-dependent labels of each sentence for statistical parametric speech synthesis are generated according to the context information obtained from above text analysis and prosodic prediction procedures. A Hidden Markov Model（HMM） based Mandarin statistical parametric speech synthesis is designed to evaluate the influences of different labels on quality of synthesized speech. Tests show that more context-dependent label information can achieve higher quality of synthesized speech.

Key words: text analysis, speech synthesis, context-dependent label, prosodic prediction, grapheme-to-phoneme conversion

摘要： 针对汉语统计参数语音合成中的上下文相关标注生成，设计了声韵母层、音节层、词层、韵律词层、韵律短语层和语句层6层上下文相关的标注格式。对输入的中文语句进行文本规范并利用语法分析获得语句的结构和分词信息;通过字音转换获得每个汉字的声韵母及声调;利用TBL（Transformation-Based error driven Learning）算法预测输入文本的韵律词边界和韵律短语边界。在此基础上，获得输入文本中每个汉字的声韵母信息及其上下文结构信息，从而产生统计参数语音合成所需的上下文相关标注。设计了一个以声韵母为合成基元的普通话的基于隐Markov模型（HMM）的统计参数语音合成系统，通过主、客观实验评测了不同标注信息对合成语音音质的影响，结果表明，上下文相关的标注信息越丰富，合成语音的音质越好。

关键词: 文本分析, 语音合成, 上下文相关标注, 韵律预测, 字音转换

HAO Dongliang, YANG Hongwu, ZHANG Ce, ZHANG Shuai, GUO Lizhao, YANG Jingbo. Label generation for Chinese statistical parametric speech synthesis[J]. Computer Engineering and Applications, 2016, 52(19): 146-153.

郝东亮，杨鸿武，张策，张帅，郭立钊，杨静波. 面向汉语统计参数语音合成的标注生成方法[J]. 计算机工程与应用, 2016, 52(19): 146-153.

[1]	ZHANG Xiaofeng, XIE Jun, LUO Jianxin, YANG Tao. Overview of Deep Learning Speech Synthesis Technology [J]. Computer Engineering and Applications, 2021, 57(9): 50-59.
[2]	CHEN Zhousi, HU Wenxin. Speech synthesis using simplified LSTM [J]. Computer Engineering and Applications, 2018, 54(3): 131-135.
[3]	CAI Wenbin1, WEI Yunlong1, XU Haihua2, PAN Lin1. Hybrid unit seletion speech synthesis system target cost construction [J]. Computer Engineering and Applications, 2018, 54(24): 20-25.
[4]	GE Yongkan, YU Fengqin . Improved speech synthesis with adaptive postfilter parameters [J]. Computer Engineering and Applications, 2017, 53(1): 168-171.
[5]	OUYANG Liubo, GUO Hailin. Automatic analysis modeling method based on structural description of domain requirements [J]. Computer Engineering and Applications, 2016, 52(20): 52-57.
[6]	XU Shipeng, YANG Hongwu, WANG Haiyan. Speech unit segmentation for Tibetan speech synthesis [J]. Computer Engineering and Applications, 2015, 51(6): 199-203.
[7]	FU Quansheng, DONG Kaikun, YIN Lu. Adult image recognition algorithm based on Bag-of-Visual-Words and text analysis [J]. Computer Engineering and Applications, 2015, 51(4): 175-179.
[8]	YU Jinping1, ZHU Guixiang2, MEI Hongbiao3. Research and improvement of HITS algorithm based on Web link analysis [J]. Computer Engineering and Applications, 2013, 49(21): 42-45.
[9]	KUI Liping, YANG Jian, HU Enxing, HE Bin. Appliance of trainable speech synthesis to Vietnamese [J]. Computer Engineering and Applications, 2012, 48(35): 101-105.
[10]	Gulijiamali Maimaitiaili, Aisikaer Rouzi, Aisikaer Aimudula. Uighur speech synthesis method based on multi-level unit and prosodic parameter matching [J]. Computer Engineering and Applications, 2012, 48(2): 116-118.
[11]	XIONG Zhongyang, XIANG Haiyan, ZHANG Yufang. Local context analysis approach combined with user log [J]. Computer Engineering and Applications, 2012, 48(12): 74-77.
[12]	LIU Bo1，YANG Hongwu1，GAN Zhenye1，2，GUO Weitong1. Grapheme-to-phoneme conversion of Tibetan with SAMPA [J]. Computer Engineering and Applications, 2011, 47(35): 117-121.
[13]	Mamateli Tursun. Context dependent syllable based speech synthesis system for Uyghur [J]. Computer Engineering and Applications, 2011, 47(31): 141-143.
[14]	LIU Shun-jiang，LIU Guo-hua，LI Ying. Complex schema matching approach based on informational context of data sources [J]. Computer Engineering and Applications, 2010, 46(9): 120-122.
[15]	LI Jin-long，YANG Hong-wu，LIANG Qing-qing，PEI Dong，LIU Hui-juan. Lyrics to singing voice synthesis system [J]. Computer Engineering and Applications, 2010, 46(16): 124-126.

Label generation for Chinese statistical parametric speech synthesis

面向汉语统计参数语音合成的标注生成方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics