计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (2): 145-147.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

维吾尔语连续语音识别声学模型优化研究

努尔麦麦提·尤鲁瓦斯,吾守尔·斯拉木   

  1. 新疆大学 信息科学与工程学院,乌鲁木齐 830046
  • 出版日期:2013-01-15 发布日期:2013-01-16

Optimization of acoustic model for Uyghur continuous speech recognition

Nurmemet Yolwas, Wushour Silamu   

  1. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
  • Online:2013-01-15 Published:2013-01-16

摘要: 综合了语音识别中常用的高斯混合模型和人工神经网络框架优点的Tandem特征提取方法应用于维吾尔语声学模型训练中,经过一系列后续处理,将原始的MFCC特征转化为Tandem特征,以此作为基于隐马尔可夫统计模型的语音识别系统的输入,并使用最小音素错误区分性训练准则训练声学模型,进而完成在测试集上的识别实验。实验结果显示,Tandem区分性训练方法使识别系统的单词错误率比原先的基于最大似然估计准则的系统相对减少13%。

关键词: 维吾尔语, 语音识别, 最小音素错误, Tandem特征

Abstract: This paper gives an introduction to the application of Tandem feature extraction method which holds the advantages of Gaussian mixture model and artificial neural network frameworks to Uyghur acoustic modeling. At the beginning, a series of processes convert the original Mel Frequency Cepstrum Coefficient(MFCC) feature to Tandem feature as the input to the hidden Markov model based speech recognition system, then the acoustic model is discriminatively trained  according to the minimum phone error discriminative criterion, finally the experiments are carried out on the test set. Experimental results show that minimum phone error trained acoustic model on Tandem feature can give a relative word error rate reduction of 13% over the maximum likelihood estimated system.

Key words: Uyghur, speech recognition, minimum phone error, Tandem feature