Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (20): 240-244.DOI: 10.3778/j.issn.1002-8331.1806-0468

Previous Articles     Next Articles

Audio Enhancement Based on Generative Adversarial Nets

ZHANG Yi, GU Yi, HAN Fang, WANG Zhijie   

  1. 1.School of Information Science and Technology, Donghua University, Shanghai 201620, China
    2.Shanghai Institute of Space Electronics Technology, Shanghai 201109, China
  • Online:2019-10-15 Published:2019-10-14



  1. 1.东华大学 信息科学与技术学院,上海 201620
    2.上海航天电子技术研究所,上海 201109

Abstract: For the convenience of network transmission as well as the ease of server harddisks’ burden, large amount of audio files are compressed, while the audio quality is decreased. This paper proposes an audio quality enhancement algorithm-ASRGAN(Audio Super Resolution Generative Adversarial Networks)-for MPEG-1 Layer 3 files. In this algorithm, the generative network and discrimitive network form a competitive learning, and alternative weighting training is used. Combined with dilation convolution and bidirectional recurrent neural network which evidently enhances the network’s ability in disposing of overlong sequence, the optimal audio quality restoration network is finally set up. This algorithm can reduce the network bandwidth and storage space used by audio files and meanwhile maintain a decent audio quality.

Key words: Generative Adversarial Nets(GAN), audio enhancement, model compression

摘要: 为了方便网络传输和本地存储需对大量音频文件进行压缩处理,但获取存储空间下降的同时会牺牲相应的音质。针对音频最常使用的MPEG-1Layer3有损压缩方法,即mp3文件,使用ASRGAN(Audio Super-Resolution Generative Adversarial Nets)对码率下降的音频进行音质还原,使用生成模型和判别模型相互促进学习,并进行交叠加权处理,同时使用空洞卷积和双向循环网络增强整体网络对超长序列处理的能力,最终选出最优的音频提升模型。该方法减小了音频传输和存储所使用的网络带宽和存储容量,同时还能够获得较好的音质。

关键词: 生成对抗网络(GAN), 音质提升, 模型压缩