计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (12): 25-36.DOI: 10.3778/j.issn.1002-8331.2201-0371

• 热点与综述 • 上一篇    下一篇

基于Transformer的单通道语音增强模型综述

范君怡,杨吉斌,张雄伟,郑昌艳   

  1. 1.陆军工程大学 指挥控制工程学院,南京 210007
    2.火箭军士官学校 测试控制系,山东 潍坊 262500
  • 出版日期:2022-06-15 发布日期:2022-06-15

Research on Transformer-Based Single-Channel Speech Enhancement

FAN Junyi, YANG Jibin, ZHANG Xiongwei, ZHENG Changyan   

  1. 1.College of Command and Control Engineering, Army Engineering University, Nanjing 210007, China
    2.Department of Test Control, High-Tech Institute, Weifang, Shandong 262500, China
  • Online:2022-06-15 Published:2022-06-15

摘要: 深度学习可以有效地解决带噪语音信号与干净语音信号之间复杂的映射问题,改善单通道语音增强的质量,但是增强语音的质量依然不理想。Transformer在语音信号处理领域中已得到了广泛应用,由于集成了多头注意力机制,可以更好地关注语音的长时相关性,该模型可以进一步改善语音增强效果。基于此,回顾了基于深度学习的语音增强模型,归纳了Transformer模型及其内部结构,从不同实现结构出发对基于Transformer的语音增强模型分类,详细分析了几种实例模型。并在常用数据集上对比了Transformer单通道语音增强的性能,分析了它们的优缺点。对相关研究工作的不足进行了总结,并对未来发展进行展望。

关键词: 语音增强, 深度学习, Transformer, 单通道, 多头注意力机制

Abstract: Deep learning can effectively solve the complex mapping problem between noisy speech signals and clean speech signals to improve the quality of single-channel speech enhancement, but the enhancement effect based on network models is not satisfactory. Transformer has been widely used in the field of speech signal processing due to the fact that it integrates multi-headed attention mechanism and can focus on the long-term correlation existing in speech. Based on this, deep learning-based speech enhancement models are reviewed,  the Transformer model and its internal structure are summarized, Transformer-based speech enhancement models are classified in terms of different implementation structures, and several example models are analyzed in detail. Furthermore, the performance of Transformer-based single-channel speech enhancement is compared on the public datasets, and their advantages and disadvantages are analyzed. The shortcomings of the related research work are summarized and future developments are envisaged.

Key words: speech enhancement, deep learning, Transformer, single-channel, multi-attention mechanism