Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (9): 38-42.DOI: 10.3778/j.issn.1002-8331.1806-0449

Previous Articles     Next Articles

Feature Joint Optimization of Deep Belief Network for Speech Enhancement

WANG Yan, JIA Hairong, JI Huifang, WANG Weimei   

  1. College of Information and Computer, Taiyuan University of Technology, Yuci, Shanxi 030600, China
  • Online:2019-05-01 Published:2019-04-28


王  雁,贾海蓉,吉慧芳,王卫梅   

  1. 太原理工大学 信息与计算机学院,山西 榆次 030600

Abstract: Concerning the problem that the poor generalization ability of Deep Believe Network(DBN) which leads to poor speech enhancement performance, a regression DBN speech enhancement algorithm based on features jointing optimization is proposed. It is not necessary to make any assumptions about speech and noise in advance. The Log-Mel frequency Power Spectrum(LMPS) of speech is extracted to be used directly for constructing the enhanced speech signals to ensure the quality of speech hearing, and the Mel-Frequency Cepstral Coefficients(MFCC) of speech is extracted as an auxiliary features, respectively. All the parameters of the original deep belief network architecture are optimized by integrating the combination feature into DBN system. This joint optimization estimation scheme imposes MFCC constraints not available in the direct prediction of LMPS, and improves the generalization ability of the model to estimate the LMPS, and reconstructs the enhanced speech more accurately. Simulation results in different SNR enviroment show that compared with single feature optimization such as Log Power Spectrum(LPS) and LMPS, LMPS and MFCC joint optimization can enable the enhanced speech obtain higher PESQ and SNR, and improve speech quality and intelligibility.

Key words: Deep Believe Network(DBN), speech enhancement, joint optimization, regression

摘要: 针对深度信念网络(Deep Believe Network,DBN)模型泛化能力较弱,导致语音增强效果不佳的问题,提出了一种特征联合优化的回归DBN语音增强算法。该算法对语音和噪声不做任何假设。该算法分别提取语音信号的LMPS(Log-Mel frequency Power Spectrum)和MFCC(Mel-Frequency Cepstral Coefficients)特征。LMPS用于直接重构增强语音,保证了语音听觉质量,MFCC作为辅助次级特征。将两种特征联合输入到DBN体系中对网络参数进行优化。这种联合优化在对LMPS的直接预测中加入MFCC限制,提升了模型对LMPS估计的泛化能力,更加准确地重构增强语音。仿真结果表明,在不同的信噪比环境下,与LPS(Log Power Spectrum)和LMPS单特征优化相比,LMPS和MFCC联合优化使增强语音获得了较高的PESQ和SNR,提高了语音质量和可懂度。

关键词: 深度信念网络, 语音增强, 联合优化, 回归