计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (34): 144-147.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于频域时域联合分析的语音端点检测

王坤赤,袁  燕,王建强,张裕胜,杨永杰   

  1. 南通大学 电子信息学院,江苏 南通 226019
  • 出版日期:2012-12-01 发布日期:2012-11-30

Speech endpoint detection based on frequency domain and time domain analyses

WANG Kunchi, YUAN Yan, WANG Jianqiang, ZHANG Yusheng, YANG Yongjie   

  1. Department of Electrical Information, Nantong University, Nantong, Jiangsu 226019, China
  • Online:2012-12-01 Published:2012-11-30

摘要: 通过计算语音频谱上谐波基频能量,在频域上检测浊音信号。因谐波频谱是乐音的基本特征,所以这种算法可以有效地消除各种非乐音噪音信号的影响,具有较高灵敏度和准确性。根据检测到的浊音位置和基频值,利用语音信号时域短时平稳特性,在时域上应用互相关系数确定相邻基音节,进而精确检测浊音信号的起始和终止端点。根据清音频率较高的特点,先对语音信号通过二阶微分提升高频能量。应用Teager能量算子可以同时分析能量和频率变化的特点检测纯净语音信号中清音的起始和终止端点。实验研究结果表明语音端点检测算法具有较高的可靠性和精确性。

关键词: 谐波, 互相关函数, Teager能量算子

Abstract: In frequency domain voice activity is detected with the spectral harmonic energy of fundamental wave. The algorithm can effectively eliminate noises of sorts, for harmonics only appear in spectrum of musical tone. So the algorithm is sensitive and accurate. In time domain every pitch is detected by cross-correlation function in virtue of the time of voice activity and fundamental frequency that is obtained through voice activity detection. So the sonant boundary is precisely detected. Second order difference enhances the high frequency component of signal, and cross-correlation function is used to trace the energy of unvoiced sound. Experiments show that the algorithm is reliable and accurate.

Key words: harmonic, cross-correlation function, Teager energy operator