Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (24): 117-121.DOI: 10.3778/j.issn.1002-8331.1808-0432

Previous Articles     Next Articles

Research on Improving Phoneme Recognition Rate Based on Subspace Gaussian Mixture Model and Deep Neural Network Combination

JIA Bingbing, CAO Hui, QIN Chijie   

  1. School of Physics and Information Technology, Shaanxi Normal University, Xi’an 710119, China
  • Online:2019-12-15 Published:2019-12-11

基于SGMM和DNN结合提高音素识别率的研究

贾兵兵,曹辉,秦驰杰   

  1. 陕西师范大学 物理学与信息技术学院,西安 710119

Abstract: In order to reduce the phoneme recognition error rate of acoustic features in speech recognition system and improve system performance, a Subspace Gaussian Mixture Model(SGMM) and Deep Neural Network(DNN) combined with extraction features are proposed. The parameter size of SGMM is analyzed and the computational complexity is reduced. After the degree is connected with DNN, the phoneme recognition rate is further improved. The speech data transformed by nonlinear feature is input into the model to find the optimal configuration of the deep neural network structure, and a more reliable network model for learning and training is established for feature extraction. The phoneme recognition error rate is compared to judge the system performance. Experimental simulation results show that the features extracted based on the system are significantly better than the traditional acoustic model.

Key words: acoustic feature, phoneme recognition, subspace Gaussian mixture model, deep neural network

摘要: 为降低声学特征在语音识别系统中的音素识别错误率,提高系统性能,提出一种子空间高斯混合模型和深度神经网络结合提取特征的方法,分析了子空间高斯混合模型的参数规模并在减少计算复杂度后将其与深度神经网络串联进一步提高音素识别率。把经过非线性特征变换的语音数据输入模型,找到深度神经网络结构的最佳配置,建立学习与训练更可靠的网络模型进行特征提取,通过比较音素识别错误率来判断系统性能。实验仿真结果证明,基于该系统提取的特征明显优于传统声学模型。

关键词: 声学特征, 音素识别, 子空间高斯混合模型, 深度神经网络