Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (14): 21-24.

• 博士论坛 • Previous Articles     Next Articles

BTSM AND DBN MODEL FOR CONTINUOUS SPEECH RECOGNITION AND VISEME SEGMENTATION

Dongmei Jiang xiaoyue jiang yunshu hou hichem sahli   

  • Received:2007-02-02 Revised:1900-01-01 Online:2007-05-10 Published:2007-05-10

基于BTSM和DBN模型的唇读和视素切分研究

吕国云 赵荣椿 蒋冬梅 蒋晓悦 侯云舒 H.Sahli   

  1. 西北工业大学 西北工业大学 11系 西北工业大学 西北工业大学
  • 通讯作者: 吕国云

Abstract: A mouth outline feature extraction based on Bayesian Tangent Shape Model (BTSM) and a lip-reading system based on Dynamic Bayesian Network is proposed for a talking head in this paper. This model describes the relationship of the word and its corresponding composed viseme, as a result, viseme segmentation sequence with time boundary is achieved. As a comparison, a DBN model based on word-phone relationship and a tri-phone HMM are used. For the system evaluation, an absolute Viseme Segmentation Accuracy (VSA) and two relative VSA based on image and geometrical feature of lip are brought out. The experiments show that DBN model has the better performance than HMM, and DBN model based on viseme can provide the best mouth shape for talking head.

摘要: 为实现文本/语音驱动的说话人头部动画,本文提出基于贝叶斯切线形状模型的口形轮廓特征提取方法和基于动态贝叶斯网络(Dynamic Bayesian Network, DBN)模型的唇读系统。在描述词与它的组成视素关系的基础上,得到视素时间切分序列。为比较性能,音素DBN模型和HMM的音素识别结果被影射成视素序列。在评价准则上,提出绝对视素切分正确性和基于图像与嘴唇几何特征两种相对视素切分正确性的评价标准。实验表明,DBN模型识别性能优于HMM,而基于视素的DBN模型能为说话人头部动画提供最好的口形。