Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (15): 168-171.

Previous Articles     Next Articles

Voice conversion using deep belief networks

WANG Min, HUANG Fei, LIU Li, WEI Mingfei, WANG Mingming   

  1. School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
  • Online:2016-08-01 Published:2016-08-12

采用深度信念网络的语音转换方法

王  民,黄  斐,刘  利,卫铭斐,王明明   

  1. 西安建筑科技大学 信息与控制工程学院,西安 710055

Abstract: This paper presents a voice conversion technique using Deep Belief Nets(DBN) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. Training the DBNs for a source speaker and a target speaker, it can then connect and convert the speaker individuality abstractions using Artificial Neural Networks(ANN). The converted abstraction of the source speaker is then brought back to the cepstrum space using an inverse process of the DBNs of the target speaker. It conducts speaker voice conversion experiments and confirms the efficacy of the method with respect to subjective and objective criteria, when comparing it with the conventional Gaussian Mixture Model-based method.

Key words: voice conversion, speaker characteristics, deep belief networks, high-order eigen spaces

摘要: 对说话人语音个性特征信息的表征和提取进行了深入研究,提出了一种基于深度信念网络(Deep Belief Nets,DBN)的语音转换方法。分别用提取出的源说话人和目标说话人语音频谱参数来训练DBN,分别得到其在高阶空间的语音个性特征表征;通过人工神经网络(Artificial Neural Networks,ANN)来连接这两个高阶空间并进行特征转换;使用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到转换后语音频谱参数,合成转换语音。实验结果表明,与传统的基于GMM方法相比,该方法效果更好,转换语音音质和相似度同目标语音更接近。

关键词: 语音转换, 语音个性特征, 深度信念网络模型, 高阶空间