[1] ZHENG C S, ZHANG H Y, LIU W Z, et al. Sixty years of frequency-domain monaural speech enhancement: from traditional to deep learning methods[J]. Trends in Hearing, 2023, 27: 23312165231209913.
[2] JIANG W Q, SUN C L, CHEN F L, et al. Low complexity speech enhancement network based on frame-level swin transformer[J]. Electronics, 2023, 12(6): 1330.
[3] WANG Y X, WANG D L. Towards scaling up classification-based speech separation[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(7): 1381-1390.
[4] LUO Y, CHEN Z, YOSHIOKA T. Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 46-50.
[5] LE X H, CHEN H S, CHEN K J, et al. DPCRN: dual-path convolution recurrent network for single channel speech enhancement[J]. arXiv:2107.05429, 2021.
[6] HU Y X, LIU Y, LV S B, et al. DCCRN: deep complex convolution recurrent network for phase-aware speech enhancement [J]. arXiv:2008.00264, 2020.
[7] LU Y X, AN Y, LING Z H. Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement[J]. arXiv:2308.08926, 2023.
[8] JU Y K, CHEN J, ZHANG S M, et al. TEA-PSE 3. 0: tencent-ethereal-audio-lab personalized speech enhancement system for ICASSP 2023 DNS-challenge[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2023: 1-2.
[9] FU Y H, LIU Y, LI J D, et al. Uformer: a Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 7417-7421.
[10] 林文模, 陈飞龙, 孙成立, 等. 两级U-Net波束形成网络的3D语音增强算法[J]. 计算机工程与应用, 2023, 59(22): 128-135.
LIN W M, CHEN F L, SUN C L, et al. 3D speech enhancement algorithm for two-stage U-Net beamforming network[J]. Computer Engineering and Applications, 2023, 59(22): 128-135.
[11] WANG Z Q, WICHERN G, LE ROUX J. On the compens-ation between magnitude and phase in speech separation[J]. IEEE Signal Processing Letters, 2021, 28: 2018-2022.
[12] DANG F, CHEN H T, ZHANG P Y. DPT-FSNet: dual-path transformer based full-band and sub-band fusion network for speech enhancement[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 6857-6861.
[13] LI J F, WEN Y, HE L H. SCConv: spatial and channel reconstruction convolution for feature redundancy[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 6153-6162.
[14] ZHANG G C, YU L B, WANG C L, et al. Multi-scale temporal frequency convolutional network with axial attention for speech enhancement[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 9122-9126.
[15] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269.
[16] HOU Z S, HU Q, CHEN K, et al. Attention does not guarantee best performance in speech enhancement[J]. arXiv:2302. 05690, 2023.
[17] HAN Q, FAN Z J, DAI Q, et al. On the connection between local attention and dynamic depth-wise convolution [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 45(6): 29.
[18] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9992-10002.
[19] VENKATARAMANAN S, GHODRATI A, ASANO Y M, et al. Skip-attention: improving vision transformers by paying less attention[J]. arXiv:2301.02240, 2023.
[20]LI A D, ZHENG C S, PENG R H, et al. On the importance of power compression and phase estimation in monaural speech dereverberation[J]. JASA Express Letters, 2021, 1(1): 014802.
[21] PAUL D B, BAKER J M. The design for the wall street journal-based CSR corpus[C]//Proceedings of the Workshop on Speech and Natural Language. Morristown: ACL, 1992: 357.
[22] REDDY C K, DUBEY H, KOISHIDA K, et al. Interspeech 2021 deep noise suppression challenge[J]. arXiv:2101.01902,2021.
[23] VARGA A, STEENEKEN H J M. Assessment for automatic speech recognition: ii. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems[J]. Speech Communication, 1993, 12(3): 247-251.
[24] VEAUX C, YAMAGISHI J, KING S. The voice bank corpus: design, collection and data analysis of a large regional accent speech database[C]//Proceedings of the 2013 International Conference Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE). Piscataway: IEEE, 2013: 1-4.
[25] THIEMANN J, ITO N, VINCENT E. The diverse environments multi-channel acoustic noise database (DEMAND): a database of multichannel environmental noise recordings[C]//Proceedings of Meetings on Acoustics, 2013: 035081.
[26] LUO Y, MESGARANI N. Conv-TasNet: surpassing ideal time-frequency magnitude masking for speech separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(8): 1256-1266.
[27] YIN D C, LUO C, XIONG Z W, et al. PHASEN: a phase-and-harmonics-aware speech enhancement network[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 9458-9465.
[28] DANG F, HU Q, ZHANG P Y. THLNet: two-stage heterogeneous lightweight network for monaural speech enhancement[J]. arXiv:2301.07939, 2023.
[29] VALIN J M, LSIK U, PHANSALKAR N, et al. A perceptually-motivated approach for low-complexity, real-time enhancement of fullband speech[J]. arXiv:2008.04259, 2020.
[30] YU G C, LI A D, LIU W Z, et al. Optimizing shoulder to shoulder: a coordinated sub-band fusion model for full-band speech enhancement[C]//Proceedings of the 2022 13th International Symposium on Chinese Spoken Language Processing. Piscataway: IEEE, 2022: 483-487.
[31] SCHR?TER H, MAIER A, ESCALANTE-B A N, et al. Deepfilternet2: towards real-time speech enhancement on embedded devices for full-band audio[C]//Proceedings of the 2022 International Workshop on Acoustic Signal Enhancement. Piscataway: IEEE, 2022: 1-5.
[32] CHEN J, WANG Z L, TUO D Y, et al. FullSubNet: channel attention fullsubnet with complex spectrograms for speech enhancement[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 7857-7861.
[33] YU G C, WANG H, LI A D, et al. FSI-Net: a dual-stage full-and sub-band integration network for full-band speech enhancement[J]. Applied Acoustics, 2023, 211: 109539.
[34] O’SHAUGHNESSY D. Speech enhancement: a review of modern methods[J]. IEEE Transactions on Human-Machine Systems, 2024, 54(1): 110-120.
[35] HOU Z S, HU Q W, CHEN K et al. Local spectral attention for full-band speech enhancement[J]. arXiv:2302.05693, 2023. |