计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (20): 149-153.

• 模式识别与人工智能 • 上一篇    下一篇

基于EMD和交叉熵的语音端点检测算法

薛俊韬,翁玉茹,张  军   

  1. 天津大学 电气与自动化工程学院,天津 300072
  • 出版日期:2016-10-15 发布日期:2016-10-14

Speech endpoint detection based on EMD and cross-entropy

XUE Juntao, WENG Yuru, ZHANG Jun   

  1. School of Electrical Engineering & Automation, Tianjin University, Tianjin 300072, China
  • Online:2016-10-15 Published:2016-10-14

摘要: In view of the problem that speech endpoint detection based on Empirical Mode Decomposition(EMD) loses its accuracy and adaptive in adverse environments, this paper proposes a novel speech endpoint detection algorithm based on EMD and cross-entropy. EMD decomposition characteristic is analyzed that probability distribution of white noise in each Intrinsic Mode Functions(IMF) is identified and unrelated to noise amplitude. Since probability distribution of white noise is different from that of speech signal, cross-entropy is used to reflect the difference of speech-frames and noise-frames. EMD-energy feature and cross-entropy are complementary so that they are combined to be a comprehensive determination for speech endpoint detection. Adaptive threshold is set to adapt to negative environments. It catches the changes of noise energy and then it is self-updated to improve accuracy in speech endpoint detection. Simulation results indicate that it is effective and superior in the presence of low Signal-to-Noise Ratio(SNR) and non-stationary noise.

关键词: endpoint detection, Empirical Mode Decomposition(EMD), cross entropy, adaptive threshold, low Signal-to-Noise Ratio(SNR)

Abstract: In view of the problem that speech endpoint detection based on Empirical Mode Decomposition(EMD) loses its accuracy and adaptive in adverse environments, this paper proposes a novel speech endpoint detection algorithm based on EMD and cross-entropy. EMD decomposition characteristic is analyzed that probability distribution of white noise in each Intrinsic Mode Functions(IMF) is identified and unrelated to noise amplitude. Since probability distribution of white noise is different from that of speech signal, cross-entropy is used to reflect the difference of speech-frames and noise-frames. EMD-energy feature and cross-entropy are complementary so that they are combined to be a comprehensive determination for speech endpoint detection. Adaptive threshold is set to adapt to negative environments. It catches the changes of noise energy and then it is self-updated to improve accuracy in speech endpoint detection. Simulation results indicate that it is effective and superior in the presence of low Signal-to-Noise Ratio(SNR) and non-stationary noise.

Key words: endpoint detection, Empirical Mode Decomposition(EMD), cross entropy, adaptive threshold, low Signal-to-Noise Ratio(SNR)