计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (35): 35-38.

• 博士论坛 • 上一篇    下一篇

基于上下文三音素DBN模型的连续语音识别

吕国云1,赵荣椿1,蒋冬梅1,SAHLI H2   

  1. 1.西北工业大学 计算机学院,西安 710072
    2.布鲁塞尔自由大学 电子信息系,Pleinlaan 2,B1050 布鲁塞尔,比利时
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-12-11 发布日期:2007-12-11
  • 通讯作者: 吕国云

DBN model based on triphone for continuous speech recognition

LV Guo-yun1,ZHAO Rong-chun1,JIANG Dong-mei1,SAHLI H2   

  1. 1.School of Computer Science,Northwestern Polytechnical University,Xi’an 710072,China
    2.Department ETRO,Vrije Universiteit Brussel,Pleinlaan 2,1050 Brussel,Belgium
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-12-11 Published:2007-12-11
  • Contact: LV Guo-yun

摘要: 考虑连续语音中的协同发音问题,提出基于词内扩展的单流上下文相关三音素动态贝叶斯网络(SS-DBN-TRI)模型和词间扩展的单流上下文相关三音素DBN(SS-DBN-TRI-CON)模型。SS-DBN-TRI模型是Bilmes提出单流DBN(SS-DBN)模型的改进,采用词内上下文相关三音素节点替代单音素节点,每个词由它的对应三音素单元构成,而三音素单元和观测向量相联系;SS-DBN-TRI-CON模型基于SS-DBN模型,通过增加当前音素的前音素节点和后音素节点,构成一个新的词间扩展的三音素变量节点,新的三音素节点和观测向量相联系,采用高斯混合模型来描述,采用数字连续语音数据库的实验结果表明:SS-DBN-TRI-CON具备最好的语音识别性能。

关键词: 动态贝叶斯网络, 语音识别, 三音素, 单音素, 上下文相关

Abstract: To accurately capture the variations of real speech spectra,two single stream Dynamic Bayesian Network(DBN) models based on context-dependent triphone:SS-DBN-TRI model and SS-DBN-TRI-CON model,are proposed for continuous speech recognition.SS-DBN-TRI model is an augmentation of Single Stream DBN(SS-DBN) model proposed by Bilmes,the phone variable is replaced by triphone variable generated by inter-word;simultaneously,based on SS-DBN model,a previous phone node and a next phone node of current phone are added,resulting in a new triphone node to describe co-articulary of continuous speech inter-word,new triphone node is associated with observation,with some probabilities modeled by Gaussian Mixture Model.Experiment is done on continuous digit audio database,results show that:SS-DBN-TRI-CON model has the best performance in word recognition.

Key words: Dynamic Bayesian Network(DBN), speech recognition, triphone, mono-phone, context-dependent