Overview of Deep Learning Speech Synthesis Technology

doi:10.3778/j.issn.1002-8331.2101-0044

Abstract

Abstract:

Speech synthesis technology plays an important role in human-machine interaction. The development of deep learning drives the rapid development of speech synthesis technology. Speech synthesis technology based on deep learning surpasses traditional speech synthesis technology in both quality and speed. This paper reviews speech synthesis technology based on deep learning vocoders and acoustic models, discusses the working principles and advantages and disadvantages of various vocoders and acoustic models, and then summarizes the speech synthesis system, systematically reviews the classic speech synthesis system based on deep learning, and finally looks forward to the speech synthesis technology based on deep learning.

Key words: speech synthesis, vocoder, acoustic model, end to end speech synthesis

摘要：

语音合成技术在人机交互中扮演着重要角色，深度学习的发展带动语音合成技术高速发展。基于深度学习的语音合成技术在合成语音的质量和速度上都超过了传统语音合成技术。从基于深度学习的声码器和声学模型出发对语音合成技术进行综述，探讨各类声码器和声学模型的工作原理及其优缺点，在此基础上对语音合成系统进行综述，系统综述经典的基于深度学习的语音合成系统，对基于深度学习的语音合成技术进行展望。

关键词: 语音合成, 声码器, 声学模型, 端到端语音合成系统

ZHANG Xiaofeng, XIE Jun, LUO Jianxin, YANG Tao. Overview of Deep Learning Speech Synthesis Technology[J]. Computer Engineering and Applications, 2021, 57(9): 50-59.

张小峰，谢钧，罗健欣，杨涛. 深度学习语音合成技术综述[J]. 计算机工程与应用, 2021, 57(9): 50-59.

[1]	CHEN Zhousi, HU Wenxin. Speech synthesis using simplified LSTM [J]. Computer Engineering and Applications, 2018, 54(3): 131-135.
[2]	CAI Wenbin1, WEI Yunlong1, XU Haihua2, PAN Lin1. Hybrid unit seletion speech synthesis system target cost construction [J]. Computer Engineering and Applications, 2018, 54(24): 20-25.
[3]	WANG Haikun, WU Dayong, LIU Jiang, WANG Shijin, HU Guoping, HU Yu. Automatic speech recognition based on time domain modeling [J]. Computer Engineering and Applications, 2017, 53(20): 243-248.
[4]	GE Yongkan, YU Fengqin . Improved speech synthesis with adaptive postfilter parameters [J]. Computer Engineering and Applications, 2017, 53(1): 168-171.
[5]	HAO Dongliang, YANG Hongwu, ZHANG Ce, ZHANG Shuai, GUO Lizhao, YANG Jingbo. Label generation for Chinese statistical parametric speech synthesis [J]. Computer Engineering and Applications, 2016, 52(19): 146-153.
[6]	XU Shipeng, YANG Hongwu, WANG Haiyan. Speech unit segmentation for Tibetan speech synthesis [J]. Computer Engineering and Applications, 2015, 51(6): 199-203.
[7]	SUN Yan1, JIANG Zhancai2, WANG Yunjie2. Voiced membership parameters and F-LBG algorithm [J]. Computer Engineering and Applications, 2014, 50(2): 204-207.
[8]	BAO Xirimo1, GAO Guanglai1, ZHANG Jing2. Genetic algorithm based optimization of acoustic model topologies [J]. Computer Engineering and Applications, 2014, 50(14): 5-8.
[9]	LI Xiuying, DUAN Xiaoyi, WANG Jianxin. Audio watermarking scheme with self-synchronization based on psychoacoustic model [J]. Computer Engineering and Applications, 2013, 49(8): 96-99.
[10]	Mirigul ABDURSUL, Mijit ABLIMIT, Akbar PATTAR, Askar HAMDULLA. Research on technologies of HTK based Uyghur continuous phoneme recognition [J]. Computer Engineering and Applications, 2013, 49(22): 150-154.
[11]	BAO Xirimo1, GAO Guanglai1, ZHANG Jing2. Construction of concise speech recognition systems based on BIC and PSO [J]. Computer Engineering and Applications, 2013, 49(10): 14-17.
[12]	WANG Shinong, XU Gang. Research of audio time-scale modification algorithm based on improved phase vocoder [J]. Computer Engineering and Applications, 2012, 48(36): 155-159.
[13]	KUI Liping, YANG Jian, HU Enxing, HE Bin. Appliance of trainable speech synthesis to Vietnamese [J]. Computer Engineering and Applications, 2012, 48(35): 101-105.
[14]	Gulijiamali Maimaitiaili, Aisikaer Rouzi, Aisikaer Aimudula. Uighur speech synthesis method based on multi-level unit and prosodic parameter matching [J]. Computer Engineering and Applications, 2012, 48(2): 116-118.
[15]	Mamateli Tursun. Context dependent syllable based speech synthesis system for Uyghur [J]. Computer Engineering and Applications, 2011, 47(31): 141-143.

Overview of Deep Learning Speech Synthesis Technology

深度学习语音合成技术综述

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics