计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (15): 209-217.DOI: 10.3778/j.issn.1002-8331.2404-0439

• 模式识别与人工智能 • 上一篇    下一篇

基于Dual-Path Skip-Transformer的轻量级语音增强网络

琚吴涵,孙成立,陈飞龙,丁碧云,郭桥生   

  1. 1.南昌航空大学 信息工程学院,南昌 330063
    2.广州航海学院 信息与通信工程学院,广州 510700
    3.朝阳聚声泰(信丰)科技有限公司,江西 赣州 341600
  • 出版日期:2025-08-01 发布日期:2025-07-31

Dual-Path Skip-Transformer Based Lightweighting Speech Enhancement Network

JU Wuhan,SUN Chengli,CHEN Feilong,DING Biyun,GUO Qiaosheng   

  1. 1.School of Information and Engineering, Nanchang Hangkong University, Nanchang 330063, China
    2.School of Information and Communication Engineering, Guangzhou Maritime College, Guangzhou 510700, China
    3.Zhaoyang Gevotai (Xin Feng) Technology Co., Ltd., Ganzhou, Jiangxi 341600, China
  • Online:2025-08-01 Published:2025-07-31

摘要: 解耦式语音增强方法将语音去噪任务解耦为幅度估计与复频谱估计两个子任务,可以获得比传统幅度谱语音增强更好的效果。Transformer由于其捕获长距离依赖关系的能力,成为解耦式语音增强模型的关键组件。然而,Transformer较高的计算复杂度限制了其在边缘设备的应用。提出了一种解耦式语音增强网络DPST-SENet(dual-path skip-Transformer speech enhancement network)。具体而言,DPST-SENet能够在幅度分支中抑制主要噪声分量,同时在复频谱分支中消除残余噪声并隐式增强相位信息。该网络引入Dual-Path Skip-Transformer模块,它能有效重用Dual-Path Transformer模块建模的信息,在降低参数量和计算复杂度的同时保持出色的性能。实验结果表明,DPST-SENet在48 kHz全频带语音数据集VoiceBank+DEMAND上的语音质量感知评估(perceptual evaluation of speech quality,PESQ)得分为3.16,优于ICASSP 2022深度噪声抑制挑战赛冠军模型MTFAA,且模型参数更少。

关键词: 语音增强, 全频带, 双路径网络, 并行去噪, 轻量化

Abstract: The decoupling-style speech enhancement method decouples the speech denoising task into two sub-tasks: magnitude estimation and complex spectrum estimation. This method has been shown to achieve better results than the traditional magnitude spectrum speech enhancement. Transformer is a key component of the decoupling-style speech enhancement model due to its parallel computation and its ability to capture long-range dependencies. However, the higher computational complexity of Transformer limits its application in edge devices. This paper presents a novel decoupled dual-path speech enhancement network, DPST-SENet. DPST-SENet is designed to efficiently suppress the main noise component in the magnitude branch while eliminating residual noise and implicitly enhancing the phase information in the complex spectral branch. The network introduces the Dual-Path Skip-Transformer module, which effectively reuses the information modeled by the Dual-Path Transformer module to reduce the number of parameters and computational complexity while maintaining excellent performance. The experimental results show that DPST-SENet achieves a PESQ score of 3.16 on the 48 kHz full-band speech dataset VoiceBank+DEMAND, outperforming the champion model MTFAA of the ICASSP 2022 deep noise suppression (DNS) challenge, with fewer model parameters.

Key words: speech enhancement, full-band, dual-path network, parallel denoising, lightweighting