计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (20): 215-217.

• 信号处理 • 上一篇    下一篇

基于稀疏表征的话者识别

吕小听,李  昕,屈燕琴,胡  晨   

  1. 上海大学 机电工程与自动化学院,上海 200072
  • 出版日期:2014-10-15 发布日期:2014-10-28

Speaker identification based on sparse representation

LV Xiaoting, LI Xin, QU Yanqin, HU Chen   

  1. School of Mechatronics Engineering and Automation, Shanghai University, Shanghai 200072, China
  • Online:2014-10-15 Published:2014-10-28

摘要: 近年来,随着信号的稀疏性理论越来越受到人们的关注,稀疏表征分类器也作为一种新型的分类算法被应用到话者识别系统中。该模型的基本思想是:只要超完备字典足够大,任意待测样本都能够用超完备字典进行线性表示。基于信号的稀疏性理论,未知话者的向量系数,即稀疏解可以通过L1范数最小化获取。超完备字典则可视为语音特征向量在高斯混合模型-通用背景模型(GMM-UBM)上进行MAP自适应而得到的大型数据库。采用稀疏表征模型作为话者辨认的分类方法,基于TIMIT语料库的实验结果表明,所采用的话者辨认方法,能够大大提高说话人识别系统的性能。

关键词: 稀疏表征, 高斯混合模型(GMM)均值超向量, 超完备字典, 最大后验(MAP)算法

Abstract: The signal sparse theory has received more and more attentions in recent years. Sparse representation, a new classification method for speaker identification has been applied into the speaker identification system. The main idea based on this new approach is that an unknown test utterance can be represented as a linear combination of the training database while the training patterns are sufficient. According to the sparse theory, the coefficients of unknown test utterances corresponding to the class index of test models could be obtained by L1-norm minimization. Over-complete dictionary could be developed by adapting speech features to Gaussian Mixture Model-Universal Background Model(GMM-UBM) using Maximum-A-Posteriori(MAP) adaptation. This paper makes use of the sparse representation model for speaker identification, and the experiments conducted on TIMIT acoustic-phonetic continuous speech corpus show that the performance of the proposed method consistently outperforms the state of art speaker identification classifiers.

Key words: sparse representation, Gaussian Mixture Model(GMM) supervectors, over-complete dictionary, Maximum-A-Posteriori(MAP) algorithm