计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (18): 144-147.

• 模式识别与人工智能 • 上一篇    下一篇

基于i-向量和PCA字典学习稀疏表示的说话人确认

舒  毅1,邢玉娟2   

  1. 1.甘肃省计算中心,兰州 730000
    2.兰州文理学院 数字媒体学院,兰州 730000
  • 出版日期:2016-09-15 发布日期:2016-09-14

Speaker verification based on i-vector and sparse representation using PCA dictionary learning

SHU Yi1, XING Yujuan2   

  1. 1.Gansu Computing Center, Lanzhou 730000, China
    2.School of Digital Media, Lanzhou University of Arts and Science, Lanzhou 730000, China
  • Online:2016-09-15 Published:2016-09-14

摘要: 稀疏表示以其出色的分类性能成为说话人确认研究的热点,其中过完备字典的构建是关键,直接影响其性能。为了提高说话人确认系统的鲁棒性,同时解决稀疏表示过完备字典中存在噪声及信道干扰信息的问题,提出一种基于i-向量的主成分稀疏表示字典学习算法。该算法在高斯通用背景模型的基础上提取说话人的i-向量,并使用类内协方差归一化技术对i-向量进行信道补偿;根据信道补偿后的说话人i-向量的均值向量估计其信道偏移空间,在该空间采用主成分分析方法提取低维信道偏移主分量,用于重新计算说话人i-向量,从而达到进一步抑制i-向量中信道干扰的目的;将新的i-向量作为字典原子构建高鲁棒性稀疏表示过完备字典。在测试阶段,测试语音的i-向量在该字典上寻找其稀疏表示系数向量,根据系数向量对测试i-向量的重构误差确定目标说话人。仿真实验表明,该算法具有良好的识别性能。

关键词: 说话人确认, i-向量, 稀疏表示, 过完备字典, 高斯通用背景模型

Abstract: Sparse representation becomes the research hot because of its excellent classification in speaker verification. The generation of over-complete dictionary is the key problem in sparse representation. This paper proposes a novel sparse representation algorithm based on i-vector and PCA dictionary learning, and applies it to speaker verification. By doing this, it is expected to reduce the noise and channel interference information in over-complete dictionary and improve the robustness of speaker verification. In the method, GMM-UBM is used to extract i-vectors of speakers firstly. And then, WCCN is adopted as channel compensation method to suppress channel interference in i-vectors. According to the mean vectors of i-vector, it estimates channel offset space. In this offset space, it utilizes PCA to obtain channel offset principal components. Using these principal components, it re-computes i-vectors to develop robust over-complete dictionary. In testing phase, it searches sparse representation coefficient vector of the testing i-vectors on this dictionary. Finally, target speaker is judged according to the coefficient vector reconstruction error. Experimental results verify the effectiveness and feasibility of the method.

Key words: speaker verification, i-vector, sparse representation, over-complete dictionary, Gaussian mixture model-universal background model