计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (11): 179-182.DOI: 10.3778/j.issn.1002-8331.2010.11.055

• 图形、图像、模式识别 • 上一篇    下一篇

GMM文本无关的说话人识别系统研究

蒋 晔,唐振民   

  1. 南京理工大学 计算机科学与技术学院,南京 210094
  • 收稿日期:2008-10-07 修回日期:2009-01-06 出版日期:2010-04-11 发布日期:2010-04-11
  • 通讯作者: 蒋 晔

Research on GMM text-independent speaker recognition

JIANG Ye,TANG Zhen-min   

  1. School of Computer Science and Technology,Nanjing University of Science and Technology,Nanjing 210094,China
  • Received:2008-10-07 Revised:2009-01-06 Online:2010-04-11 Published:2010-04-11
  • Contact: JIANG Ye

摘要: 在高斯混合模型(Gaussian Mixture Model,GMM)训练时,对传统的模型参数初始化方法(随机法、K均值聚类法)进行改进,提出分裂法与K均值聚类相结合的新方法。实验表明,采用改进的方法与传统方法相比,系统平均识别率有15.47%和7.5%的提高。研究了GMM的阶数、协方差阈值、预加重系数对系统识别率的影响。对实验结果进行详细分析,并根据实验数据,取它们各自表现最好的值,从而使构建的说话人识别系统获得一个较高的识别率。实验表明,在规定的实验条件下,系统可达到90%以上的识别率。

关键词: 说话人识别, 高斯混合模型, 美尔频率倒谱系数(MFCC), 分裂法与K均值聚类结合法

Abstract: This paper improves the traditional method of Gaussian Mixture Model(GMM) parameters initialization at the time of GMM training.A new approach which combines division and K-means clustering is presented.The experiment shows that the proposed method can achieve the average recognition rate increase by 15.47% and 7.5% compared with the randomization and K-means clustering.At the same time,the impact of the order of GMM,covariance threshold and pre-emphasis coefficient on system recognition rate are studied.Meanwhile,the experiment results are analyzed in detail.In order to make the speaker recognition system get a higher recognition rate,their optimal values are chosen from the experiment data.The experiment shows that the system can achieve the recognition rate with above 90% under the provided experimental condition.

Key words: speaker recognition, Gaussian Mixture Modal(GMM), Mel Frequency Cepstrum Coefficient(MFCC), combination division and K-means clustering

中图分类号: