[1] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017.
[2] DONG L H, XU S, XU B. Speech-Transformer: a no-recurrence sequence-to-sequence model for speech recognition[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2018: 5884-5888.
[3] GULATI A, QIN J, CHIU C C, et al. Conformer: convolution-augmented transformer for speech recognition[C]//Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020: 5036-5040.
[4] LIU N, WANG L J, LI Y M, et al. End-to-end speech recognition model based on dilated sparse aware network[C]//Proceedings of the 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology. Piscataway: IEEE, 2024: 1703-1707.
[5] KITAEV N, KAISER ?, LEVSKAYA A. Reformer: the efficient transformer[C]//Proceedings of the 10th International Conference on Learning Representations, 2020: 1-11.
[6] DU J, TANG M T, ZHAO L. Transformer-like model with linear attention for speech emotion recognition[J]. Journal of Southeast University (English Edition), 2021, 37(2): 164-170.
[7] WINATA G I, CAHYAWIJAYA S, LIN Z J, et al. Lightweight and efficient end-to-end speech recognition using low-rank transformer[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 6144-6148.
[8] GEVA M, SCHUSTER R, BERANT J, et al. Transformer feed-forward layers are key-value memories[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 5484-5495.
[9] WANG J Z, LIANG Z Q, ZHANG X L, et al. EfficientASR: speech recognition network compression via attention redundancy and chunk-level FFN optimization[J]. arXiv:2404. 19214, 2024.
[10] GAO S, RAMANATHAN A, TOURASSI G. Hierarchical convolutional attention networks for text classification[C]//Proceedings of the 3rd Workshop on Representation Learning for NLP. Stroudsburg: ACL, 2018: 11-23.
[11] ZENG K G, PAIK I. A lightweight transformer with convolutional attention[C]//Proceedings of the 2020 11th International Conference on Awareness Science and Technology. Piscataway: IEEE, 2020: 1-6.
[12] WANG H D, SHEN X, TU M, et al. Improved transformer with multi-head dense collaboration[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30: 2754-2767.
[13] BU H, DU J Y, NA X Y, et al. AISHELL-1: an open-source mandarin speech corpus and a speech recognition baseline[C]//Proceedings of the 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment. Piscataway: IEEE, 2017: 1-5.
[14] CHUNG J S, SENIOR A, VINYALS O, et al. Lip reading sentences in the wild[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3444-3453.
[15] AL-RFOU R, CHOE D, CONSTANT N, et al. Character-level language modeling with deeper self-attention[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 3159-3166.
[16] 胡章芳, 蹇芳, 唐珊珊, 等. DFSMN-T: 结合强语言模型Transformer的中文语音识别[J]. 计算机工程与应用, 2022, 58(9): 187-194.
HU Z F, JIAN F, TANG S S, et al. DFSMN-T: mandarin speech recognition with language model transformer[J]. Computer Engineering and Applications, 2022, 58(9): 187-194.
[17] FAN C H, YI J Y, TAO J H, et al. Gated recurrent fusion with joint training framework for robust end-to-end speech recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 29: 198-209.
[18] TIAN Z K, YI J Y, TAO J H, et al. Spike-triggered non-autoregressive transformer for end-to-end speech recognition[C]//Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020: 5026-5030.
[19] 沈逸文, 孙俊. 结合Transformer的轻量化中文语音识别[J]. 计算机应用研究, 2023, 40(2): 424-429.
SHEN Y W, SUN J. Lightweight Chinese speech recognition with transformer[J]. Application Research of Computers, 2023, 40(2): 424-429. |