DBN model based on triphone for continuous speech recognition

Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (35): 35-38.

• 博士论坛 • Previous Articles Next Articles

DBN model based on triphone for continuous speech recognition

LV Guo-yun¹,ZHAO Rong-chun¹,JIANG Dong-mei¹,SAHLI H²

1.School of Computer Science，Northwestern Polytechnical University，Xi’an 710072，China
2.Department ETRO，Vrije Universiteit Brussel，Pleinlaan 2，1050 Brussel，Belgium

Received:1900-01-01 Revised:1900-01-01 Online:2007-12-11 Published:2007-12-11
Contact: LV Guo-yun

基于上下文三音素DBN模型的连续语音识别

吕国云¹,赵荣椿¹,蒋冬梅¹,SAHLI H²

1.西北工业大学计算机学院，西安 710072
2.布鲁塞尔自由大学电子信息系，Pleinlaan 2，B1050 布鲁塞尔，比利时

通讯作者: 吕国云

Abstract

Abstract: To accurately capture the variations of real speech spectra，two single stream Dynamic Bayesian Network（DBN） models based on context-dependent triphone：SS-DBN-TRI model and SS-DBN-TRI-CON model，are proposed for continuous speech recognition.SS-DBN-TRI model is an augmentation of Single Stream DBN（SS-DBN） model proposed by Bilmes，the phone variable is replaced by triphone variable generated by inter-word；simultaneously，based on SS-DBN model，a previous phone node and a next phone node of current phone are added，resulting in a new triphone node to describe co-articulary of continuous speech inter-word，new triphone node is associated with observation，with some probabilities modeled by Gaussian Mixture Model.Experiment is done on continuous digit audio database，results show that：SS-DBN-TRI-CON model has the best performance in word recognition.

Key words: Dynamic Bayesian Network（DBN）, speech recognition, triphone, mono-phone, context-dependent

摘要： 考虑连续语音中的协同发音问题，提出基于词内扩展的单流上下文相关三音素动态贝叶斯网络（SS-DBN-TRI）模型和词间扩展的单流上下文相关三音素DBN（SS-DBN-TRI-CON）模型。SS-DBN-TRI模型是Bilmes提出单流DBN（SS-DBN）模型的改进，采用词内上下文相关三音素节点替代单音素节点，每个词由它的对应三音素单元构成，而三音素单元和观测向量相联系；SS-DBN-TRI-CON模型基于SS-DBN模型，通过增加当前音素的前音素节点和后音素节点，构成一个新的词间扩展的三音素变量节点，新的三音素节点和观测向量相联系，采用高斯混合模型来描述，采用数字连续语音数据库的实验结果表明：SS-DBN-TRI-CON具备最好的语音识别性能。

关键词: 动态贝叶斯网络, 语音识别, 三音素, 单音素, 上下文相关

LV Guo-yun¹,ZHAO Rong-chun¹,JIANG Dong-mei¹,SAHLI H². DBN model based on triphone for continuous speech recognition[J]. Computer Engineering and Applications, 2007, 43(35): 35-38.

吕国云¹,赵荣椿¹,蒋冬梅¹,SAHLI H². 基于上下文三音素DBN模型的连续语音识别[J]. 计算机工程与应用, 2007, 43(35): 35-38.

[1]	LOU Yingdan, XU Jinglin, HUANG Lixia, ZHANG Xueying. Speech Recognition Based on MLLR and MAP Under Distant Noise Reverberation Environment [J]. Computer Engineering and Applications, 2020, 56(10): 122-126.
[2]	ZHAO Yue, LI Yaoqiang, XU Xiaona, WU Licheng. Near-optimal active learning for Tibetan speech recognition [J]. Computer Engineering and Applications, 2018, 54(22): 156-159.
[3]	HUANG Xiaohui1，2, LI Jing1, MA Rui2，3. Design and research of Tibetan spoken speech corpus [J]. Computer Engineering and Applications, 2018, 54(13): 231-235.
[4]	SONG Chunxiao, SUN Ying. Nonlinear geometric feature extraction algorithm for emotional speech recognition [J]. Computer Engineering and Applications, 2017, 53(20): 128-133.
[5]	HUANG Lixia1, WANG Yanan1, ZHANG Xueying1, WANG Hongcui2. Research on noise robustness of speech recognition based on deep auto-encoder neural network [J]. Computer Engineering and Applications, 2017, 53(13): 49-54.
[6]	ZHAO Caiguang, ZHANG Shuqun, LEI Zhaoyi. Improved speech recognition of GRBM based on parallel tempering [J]. Computer Engineering and Applications, 2016, 52(8): 125-129.
[7]	Dawel Abilhayer, Nurmemet Yolwas, LIU Yan. On language model construction for LVCSR in Kazakh [J]. Computer Engineering and Applications, 2016, 52(24): 178-181.
[8]	CHAO Hao, SONG Cheng, XUE Xiao, LIU Zhizhong. Vocal effort related robust speech recognition based on adaptation method [J]. Computer Engineering and Applications, 2016, 52(2): 156-160.
[9]	HAO Dongliang, YANG Hongwu, ZHANG Ce, ZHANG Shuai, GUO Lizhao, YANG Jingbo. Label generation for Chinese statistical parametric speech synthesis [J]. Computer Engineering and Applications, 2016, 52(19): 146-153.
[10]	CHAO Hao. Decoding algorithm of integrating phonetic string edit distance into stochastic segment models [J]. Computer Engineering and Applications, 2015, 51(6): 208-211.
[11]	WANG Lulu1, XIA Xu2, FENG Lu1, LIU Guangcan1. New speech endpoint detection algorithm based on spectrum variance and spectral subtraction [J]. Computer Engineering and Applications, 2014, 50(8): 194-197.
[12]	CHAO Hao, SONG Cheng, LIU Zhizhong. Integrating tone models into speech recognition system based on articulatory feature [J]. Computer Engineering and Applications, 2014, 50(23): 21-25.
[13]	ZHANG Xiaojing1，2, JIANG Dongmei1，2, FAN Ping3, SAHLI Hichem3. Audio visual emotion recognition based on modified asynchronous DBN models [J]. Computer Engineering and Applications, 2014, 50(21): 162-165.
[14]	BAO Xirimo1, GAO Guanglai1, ZHANG Jing2. Genetic algorithm based optimization of acoustic model topologies [J]. Computer Engineering and Applications, 2014, 50(14): 5-8.
[15]	HE Yuanyuan1, ZHANG Xueying1, LIU Xiaofeng2. Multiclass classification pre-selection of SVM in speech recognition application [J]. Computer Engineering and Applications, 2013, 49(7): 115-118.

DBN model based on triphone for continuous speech recognition

基于上下文三音素DBN模型的连续语音识别

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics