Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (7): 8-16.DOI: 10.3778/j.issn.1002-8331.1911-0415

Previous Articles     Next Articles

Survey of Speaker Recognition in Deep Learning Framework

ZENG Chunyan, MA Chaofeng, WANG Zhifeng, ZHU Dongliang, ZHAO Nan, WANG Juan, LIU Cong   

  1. 1.Hubei Key Laboratory for High-efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology, Wuhan 430068, China
    2.Department of Digital Media Technology, Central China Normal University, Wuhan 430079, China
  • Online:2020-04-01 Published:2020-03-28

深度学习框架下说话人识别研究综述

曾春艳,马超峰,王志锋,朱栋梁,赵楠,王娟,刘聪   

  1. 1.湖北工业大学 太阳能高效利用及储能运行控制湖北省重点实验室,武汉 430068
    2.华中师范大学 数字媒体技术系,武汉 430079

Abstract:

Because of its unique convenience, economy and accuracy, speaker recognition has become an important security certification technology in people’s daily life and work. However, the complex environment in practical applications poses a huge challenge to the robustness of speaker recognition system. In recent years, deep learning has performed well in feature expression and pattern classification, providing a new direction for the further development of speaker recognition technology. Compared with the traditional speaker recognition technologies(such as GMM-UBM, GMM-SVM, JFA, i-vector, etc.), this paper focuses on speaker recognition methods under the framework of deep learning. Depending on the role of deep learning in speaker recognition, these research methods are divided into three categories:feature expression based on deep learning, back-end modeling based on deep learning, and end-to-end joint optimization. The characteristics of typical algorithms and network structure are dissected and summarized, and their specific performances are compared and analyzed. Finally, it summarizes the application properties and advantages of deep learning in speaker recognition, and analyzes the problems and challenges currently faced in speaker recognition research. The prospects of speaker recognition research under the framework of deep learning are also looked forward to promote the further development of speaker recognition technology.

Key words: speaker recognition, deep learning, feature expression, pattern classification, end to end

摘要:

说话人识别由于其独特的方便性、经济性和准确性等优势,已成为人们日常生活与工作中重要的身份认证方式。然而在实际应用场景下,对说话人识别系统的准确性、鲁棒性、迁移性、实时性等提出了巨大的挑战。近年来深度学习在特征表达和模式分类方面表现优异,为说话人识别技术的进一步发展提供了新方向。相较于传统说话人识别技术(如GMM-UBM、GMM-SVM、JFA、i-vector等),聚焦于深度学习框架下的说话人识别方法,按照深度学习在说话人识别中的作用方式,将目前的研究分为基于深度学习的特征表达、基于深度学习的后端建模、端到端联合优化三种类别,并分析和总结了其典型算法的特点及网络结构,对其具体性能进行了对比分析。最后总结了深度学习在说话人识别中的应用特点及优势,进一步分析了目前说话人识别研究面临的问题及挑战,并展望了深度学习框架下说话人识别研究的前景,以期推动说话人识别技术的进一步发展。

关键词: 说话人识别, 深度学习, 特征表达, 模式分类, 端到端