计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (22): 194-196.DOI: 10.3778/j.issn.1002-8331.2009.22.062

• 工程与应用 • 上一篇    下一篇

病态嗓音特征的小波变换提取及识别研究

于燕平1,2,胡维平1   

  1. 1.广西师范大学 物理与电子工程学院,广西 桂林 541004
    2.柳州铁道职业技术学院 电子工程系,广西 桂林 545007
  • 收稿日期:2008-04-23 修回日期:2008-07-24 出版日期:2009-08-01 发布日期:2009-08-01
  • 通讯作者: 于燕平

Research of extracting of pathological voice’s characteristics and recognition based on wavelet transformation and Gaussian mixture model

YU Yan-ping 1,2,HU Wei-ping1   

  1. 1.College of Physics and Electronic Engineering,Guangxi Normal University,Guilin,Guangxi 541004,China
    2.Department of Electronic Engineering,Liuzhou Railway Vocational Technical College,Liuzhou,Guangxi 545007,China
  • Received:2008-04-23 Revised:2008-07-24 Online:2009-08-01 Published:2009-08-01
  • Contact: YU Yan-ping

摘要: 通过分析嗓音的发音机理、病态嗓音与正常嗓音在频域的表现差异,利用小波变换对信号进行分解,突出病态嗓音的特点,提出了基于多尺度分析的小波降噪、分解的熵系数(Entropy Coefficient based on De-noise,Decomposition of Multi-scale Analysis,ECDDMA)作为识别的特征矢量集。并对比分析了语音识别中经典特征参数Mel倒谱系数(MFCC),分别运用这两种特征参数对242例正常嗓音和234例病态嗓音运用高斯混合模型(GMM)进行了识别。结果显示:ECDDMA系数较传统的模拟人耳听觉非线性特性的MFCC及其动态特征能更准确地表征正常与病态嗓音之间的差异,有利于同时提高病态和正常嗓音的识别率。

关键词: 高斯混合模型(GMM), 病态嗓音, Mel倒谱系数(MFCC), 小波变换

Abstract: Considering the voice pronunciation mechanism,the different performances of the abnormal voice and the normal voice in the field of frequency,the paper proposes a new method for extracting characteristics that is Entropy Coefficient based on De-noise,Decomposition of Multi-scale Analysis(ECDDMA) using the wavelet decomposition to find the pathological voice’s characteristics,and comparative analysis of the effective speech characteristics MFCC.242 normal voices samples and 234 abnormal samples are recognized with MFCC and the new extracted characteristics ECDDMA based on Gaussian Mixture Model (GMM).The result indicates that,the parameters of ECDDMA are more advantageous to the normal and abnormal voice recognition than the traditional MFCC and the dynamic characteristic which mimic the human ears non-linear characteristic with frequency,and improves the abnormal and normal voice’s recognition result.

Key words: Gaussian Mixture Model(GMM), pathological voice, Mel Frequency Cepstrum Coefficient(MFCC), wavelet transformation