计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (11): 114-117.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于GMM的说话人识别技术研究

曹 洁,潘 鹏   

  1. 兰州理工大学 计算机与通信学院,兰州 730050
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-04-11 发布日期:2011-04-11

Research on GMM based speaker recognition technology

CAO Jie,PAN Peng   

  1. College of Computer and Communication,Lanzhou University of Technology,Lanzhou 730050,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-04-11 Published:2011-04-11

摘要: 为了探讨高斯混合模型在说话人识别中的作用,设计了一个基于GMM的说话人识别系统。整个系统由音频信号预处理,语音活动检测,说话人模型建立以及音频信号识别4个模块组成。前三个模块构成了系统的模型训练部分,最后一个模块构成了系统的语音识别部分。包含在第二个模块中的由GMM模型搭建的语音活动检测器是研究的创新之处。利用增强的多方互动会议语料库中的视听会议对系统中的部分可调参数以及系统的识别错误率进行了测试。仿真结果表明,在语音活动检测器和若干滤波算法的帮助下,系统对包含重叠语音的音频信号的识别准确率可以达到83.02%。

关键词: 高斯混合模型, 语音活动检测, 识别错误率

Abstract: In order to investigate the function of Gaussian Mixture Model(GMM) in speaker recognition,a GMM based speaker recognition system is designed.The system consists of four modules that are audio signal pre-processing,speech activity detection,speaker modeling as well as audio signal recognition.The first three modules constitute the model training segment of the system and the last module constitutes the speech recognition segment of the system.A speech activity detector which is built by GMM in the second module is the innovation of the research.Some tunable parameters and recognition error rate of the system are tested using audio-visual meetings in the Augmented Multi-party Interaction(AMI) corpus.Simulations show that with the help of the speech activity detector and several filter algorithms,recognition accuracy rate of the system for audio signal with overlap speech can reach 83.02%.

Key words: Gaussian Mixture Model(GMM), speech activity detection, recognition error rate