计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (32): 5-8.

• 博士论坛 • 上一篇    下一篇

增强型语音可懂度的评价

马建芬1,张雪英2   

  1. 1.太原理工大学 计算机科学与技术学院,太原 030024
    2.太原理工大学 信息工程学院,太原 030024
  • 出版日期:2012-11-11 发布日期:2012-11-20

Predicting intelligibility of noise-suppressed speech

MA Jianfen1, ZHANG Xueying2   

  1. 1.College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China
    2.College of Information Engineering, Taiyuan University of Technology, Taiyuan 030024, China
  • Online:2012-11-11 Published:2012-11-20

摘要: 提出一种与主观评价相关性较高的可懂度客观评价算法。传统的基于频域分段信噪比的可懂度评价算法与主观评价的相关性不高,原因在于没有分别计算谱衰减畸变和谱放大畸变这两种畸变。为了克服这一缺点,提出将增强语音分解为衰减畸变、放大倍数小于6.02 dB的放大畸变、放大倍数大于6.02 dB的放大畸变三部分,分别计算其频域信噪比,用多线性回归方法综合这三种畸变值,使其与主观可懂值的相关值达到最高。实验结果表明,用这种方法对句子的可懂度评价结果与主观评价的相关值达到0.91。

关键词: 语音增强, 语音可懂度, 语音畸变

Abstract: The aim of the present research is to propose a measure to predict noise-suppressed speech which has higher correlation with subjective scores. The traditional frequency-weighted segmental SNR(fSNRseg) measure does not have higher correlations with subjective scores since it does not account for spectral attenuations and spectral amplification distortions introduced by speech enhancement algorithms separately. In this study, it decomposes the fSNRseg measure in three regions, corresponding to attenuation distortion only, amplification distortion up to 6.02 dB and distortion of 6.02 dB or greater. It calculates fSNRseg in each region separately. Multiple-regression analysis is run on the three decomposed measures to maximize the correlation with subjective scores. A high correlation (0.91) is obtained with sentence recognition scores with the proposed objective measure.

Key words: speech enhancement, speech intelligibility, speech distortions