Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (20): 197-202.DOI: 10.3778/j.issn.1002-8331.2006-0283

Previous Articles     Next Articles

Multi-task Learning for Speech Enhancement and Detection

WANG Shiqi, ZENG Qingning, LONG Chao, XIONG Songling, QI Xiaoxiao   

  1. School of Information and Communication, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
  • Online:2021-10-15 Published:2021-10-21



  1. 桂林电子科技大学 信息与通信学院,广西 桂林 541004


In many real-world applications of speech signal processing, real-time multi-task processing with low latency and strong robustness to noise is highly required. To solve the problem, a multi-task deep learning model of speech enhancement and Voice Activity Detection(VAD) is proposed. This model constructs a causal system suitable for real-time online processing by introducing a Long Short-Term Memory(LSTM) network. Based on the strong correlation between speech enhancement and VAD, the output layers of two tasks are connected using hard parameter sharing which lead a reduction of the number of parameters and an improvement of generalization ability of tasks through multi-task learning. Experimental results show that, processing speed of multi-task model improves considerably to 44.2% compared with the serial processing of baseline models with similar speech enhancement results and better VAD results, which is a great significance for the application and deployment of the deep learning model.

Key words: multi-task learning, deep learning, speech enhancement, voice activity detection


在许多语音信号处理的实际应用中,都要求系统能够低延迟地实时处理多个任务,并且对噪声要有很强的鲁棒性。针对上述问题,提出了一种语音增强和语音活动检测(Voice Activity Detection,VAD)的多任务深度学习模型。该模型通过引入长短时记忆(Long Short-Term Memory,LSTM)网络,构建了一个适合于实时在线处理的因果系统。基于语音增强和VAD的强相关性,该模型以硬参数共享的方式连接了两个任务的输出层,不仅减少了计算量,还通过多任务学习提高了任务的泛化能力。实验结果表明,相较串行处理两个任务的基线模型,多任务模型在语音增强结果非常相近、VAD结果更优的情况下,其速度快了44.2%,这对于深度学习模型的实际应用和部署将具有重要的意义。

关键词: 多任务学习, 深度学习, 语音增强, 语音活动检测