基于BTSM和DBN模型的唇读和视素切分研究

计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (14): 21-24.

基于BTSM和DBN模型的唇读和视素切分研究

吕国云赵荣椿蒋冬梅蒋晓悦侯云舒 H.Sahli

西北工业大学西北工业大学 11系西北工业大学西北工业大学

收稿日期:2007-02-02 修回日期:1900-01-01 出版日期:2007-05-10 发布日期:2007-05-10
通讯作者: 吕国云

BTSM AND DBN MODEL FOR CONTINUOUS SPEECH RECOGNITION AND VISEME SEGMENTATION

Dongmei Jiang xiaoyue jiang yunshu hou hichem sahli

Received:2007-02-02 Revised:1900-01-01 Online:2007-05-10 Published:2007-05-10

摘要/Abstract

摘要： 为实现文本/语音驱动的说话人头部动画，本文提出基于贝叶斯切线形状模型的口形轮廓特征提取方法和基于动态贝叶斯网络（Dynamic Bayesian Network, DBN）模型的唇读系统。在描述词与它的组成视素关系的基础上，得到视素时间切分序列。为比较性能，音素DBN模型和HMM的音素识别结果被影射成视素序列。在评价准则上，提出绝对视素切分正确性和基于图像与嘴唇几何特征两种相对视素切分正确性的评价标准。实验表明，DBN模型识别性能优于HMM，而基于视素的DBN模型能为说话人头部动画提供最好的口形。

Abstract: A mouth outline feature extraction based on Bayesian Tangent Shape Model (BTSM) and a lip-reading system based on Dynamic Bayesian Network is proposed for a talking head in this paper. This model describes the relationship of the word and its corresponding composed viseme, as a result, viseme segmentation sequence with time boundary is achieved. As a comparison, a DBN model based on word-phone relationship and a tri-phone HMM are used. For the system evaluation, an absolute Viseme Segmentation Accuracy (VSA) and two relative VSA based on image and geometrical feature of lip are brought out. The experiments show that DBN model has the better performance than HMM, and DBN model based on viseme can provide the best mouth shape for talking head.

吕国云赵荣椿蒋冬梅蒋晓悦侯云舒 H.Sahli. 基于BTSM和DBN模型的唇读和视素切分研究[J]. 计算机工程与应用, 2007, 43(14): 21-24.

Dongmei Jiang xiaoyue jiang yunshu hou hichem sahli. BTSM AND DBN MODEL FOR CONTINUOUS SPEECH RECOGNITION AND VISEME SEGMENTATION[J]. Computer Engineering and Applications, 2007, 43(14): 21-24.