计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (4): 59-71.DOI: 10.3778/j.issn.1002-8331.2405-0425

• 热点与综述 • 上一篇    下一篇

面向低数据资源的语音识别研究综述

许春冬,吴子煜,葛凤培   

  1. 1.江西理工大学 信息工程学院,江西 赣州 341000
    2.北京邮电大学,北京 100876
  • 出版日期:2025-02-15 发布日期:2025-02-14

Review of Speech Recognition Techniques for Low Data Resources

XU Chundong, WU Ziyu, GE Fengpei   

  1. 1.School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
    2.Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Online:2025-02-15 Published:2025-02-14

摘要: 近年来,自动语音识别的研究重心由传统识别方法转向基于深度学习的语音识别方法。“大模型”现象反映出深度学习方法的性能随着训练数据量的增加呈现显著上升的趋势。然而,现实环境的复杂性、语音数据分布的非均匀性和用户隐私的保护等因素给数据的收集造成困难。同时,语音数据的标注需要大量专业人员的参与,导致标注成本很高。因此,语音识别在实际应用中经常面临数据资源不足的问题。在这种低数据资源条件下构建性能优异且稳定的语音识别系统仍是研究难点。简单归纳了语音识别的发展历程,总结了语音识别的基本框架以及常见的国内外开源数据集。围绕低数据资源问题,详细分析了低数据资源的判定方法,继而梳理了四类技术方案,包括数据增强、联邦学习、自监督学习以及元学习,并对它们的性能状况以及优缺点进行了系统的剖析。最后讨论了该研究方向未来潜在的发展趋势和可能面临的问题。

关键词: 语音识别, 低数据资源, 数据增强, 联邦学习, 自监督学习, 元学习

Abstract: Recently, the focus of automatic speech recognition has shifted from traditional methods to speech recognition methods based on deep learning. Moreover, the “large model” phenomenon reflects that the performance of deep learning methods significantly improves as the volume of training data increases. However, real-world complexity, uneven speech data distribution, and privacy concerns challenge data collection. Additionally, the annotation of speech data requires the involvement of a large number of professionals, leading to high labeling costs. Therefore, speech recognition often faces the issue of insufficient data resources in practical applications. Building a high-performing and stable speech recognition system under low data resource conditions remains a research challenge. Consequently, this paper briefly summarizes the development history of speech recognition, then outlines the basic framework of speech recognition and common open-source datasets at home and abroad. Focusing on the low data resource issue, this paper analyzes the methods for determining low data resources in detail, and then reviews four categories of technical solutions, including data augmentation, federated learning, self-supervised learning, and meta-learning, provides a systematic analysis of their performance status and advantages and disadvantages. Finally, this paper discusses the potential future development trends and possible challenges faced by this research direction.

Key words: speech recognition, low data resources, data augmentation, federated learning, self-supervised learning, meta-learning