计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (8): 198-203.DOI: 10.3778/j.issn.1002-8331.2012-0298

• 模式识别与人工智能 • 上一篇    下一篇

基于虚拟教师蒸馏模型的说话人确认方法

肖金壮,李瑞鹏,纪盟盟   

  1. 河北大学 电子信息工程学院,河北 保定 071000
  • 出版日期:2022-04-15 发布日期:2022-04-15

Speaker Verification Based on Teacher-Free Knowledge Distillation Model

XIAO Jinzhuang, LI Ruipeng, JI Mengmeng   

  1. College of Electronic Information Engineering, Hebei University, Baoding, Hebei 071000, China
  • Online:2022-04-15 Published:2022-04-15

摘要: 无文本说话人确认模型通过复杂的网络结构和多变的特征提取方式来获得必要的性能,然而这会产生巨大的内存消耗和递增的计算成本,导致模型难以在资源有限的硬件设施上部署。针对该问题,利用虚拟教师蒸馏模型(teacher-free knowledge distillation,Tf-KD)可以带来百分之百的分类正确率、平滑的输出概率分布的优势,在轻量级残差网络的基础上构建虚拟教师说话人确认模型(teacher-free speaker verification model,Tf-SV)。同时引入空间共享而通道分离的动态激活函数和附加角裕度损失函数,使所提模型在特征表达、训练效率以及模型压缩后性能等方面的水平得到极大提升,最终达到无文本说话人确认模型能够在存储或者计算资源有限设备上部署的目的。基于VoxCeleb1数据集的实验表明,虚拟教师说话人确认模型的等错误率(EER)降低到3.4%。与已有成果相比,指标有明显提升,证明了在说话人确认任务上所提压缩模型的有效性。

关键词: 虚拟教师知识蒸馏, 动态激活函数, 附加角裕度损失函数, 模型压缩, 说话人确认

Abstract: The text-independent speaker verification models achieve powerful performance through complex network structure and changeable feature extraction methods, however, they need huge memory consumption and incremental computing costs, which makes it difficult to deploy the models on resource-limited hardware facilities. Focusing on this problem, this research takes advantage of the teacher-free knowledge distillation(Tf-KD) model, which can bring one hundred percent classification accuracy and smoothing output probability distribution to establish a teacher-free speaker verification(Tf-SV) model based on a lightweight residual network. At the same time, the spatial-shared and channel-wise dynamic rectified linear units function and the additive angular margin loss function(AAM-Softmax) are introduced, which greatly improve the performance of the proposed model in terms of feature expression, training efficiency and compressed model’s capabilities, and finally achieve the aim of deploying the given Tf-SV model on limited-storage or limited-computing facilities. Based on the VoxCeleb1 dataset, experimental results show that the equal error rate(EER) of the Tf-SV model is reduced to 3.4%. This is a significant improvement over the published results, and demonstrates the effectiveness of the compression model on the speaker verification task.

Key words: teacher-free knowledge distillation, dynamic rectified linear units function, additive angular margin loss function, model compression, speaker verification