Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (11): 62-74.DOI: 10.3778/j.issn.1002-8331.2309-0154
• Research Hotspots and Reviews • Previous Articles Next Articles
SONG Wei, ZHANG Yanghao
Online:
2024-06-01
Published:
2024-05-31
宋伟,张杨豪
SONG Wei, ZHANG Yanghao. Survey of Specific Speech Recognition Algorithms for Dysarthria[J]. Computer Engineering and Applications, 2024, 60(11): 62-74.
宋伟, 张杨豪. 构音障碍语音识别算法研究综述[J]. 计算机工程与应用, 2024, 60(11): 62-74.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2309-0154
[1] PUROHIT M, PATEL M, MALAVIYA H, et al. Intelligibility improvement of dysarthric speech using mmse discogan[C]//Proceedings of the 2020 International Conference on Signal Processing and Communications (SPCOM), 2020: 1-5. [2] 《中国脑卒中防治报告2019》编写组. 《中国脑卒中防治报告2019》概要[J]. 中国脑血管病杂志, 2020, 17(5): 272-281. Report on stroke prevention and treatment in China Writing Group. Brief report on stroke prevention and treatment in China, 2019[J]. Chinese Journal of Cerebrovascular Diseases, 2020, 17(5): 272-281. [3] 徐莉, 徐明成, 夏逸婷, 等. 针灸治疗缺血性脑卒中构音障碍的疗效观察[J]. 云南中医学院学报, 2017, 40(6): 95-97. XU L, XU M C, XIA Y T, et al. Observation of acupuncture and moxibustion in the treatment of dysarthria in ischemic stroke[J]. Journal of Yunnan University of Traditional Chinese Medicine, 2017, 40(6): 95-97. [4] 李阿妮. 失语症患者语音信号的识别研究[D]. 西安: 西安科技大学, 2010. LI A N. The reserch on recognition of aphasia speech signals[D]. Xi’an: Xi’an University of Science and Technology, 2010. [5] 张旺. 基于语音识别的功能性构音障碍分析评估研究[D]. 兰州: 兰州理工大学, 2019. ZHANG W. Analysis and evaluation of functional articulation disorder based on speech recognition[D]. Lanzhou: Lanzhou University of Technology, 2019. [6] 李山路, 王泳, 甘俊英. 重录语音检测算法[J]. 信号处理, 2017, 33(1): 95-101. LI S L, WANG Y, GAN J Y. An algorithm of speech recapture detection[J]. Journal of Signal Processing, 2017, 33(1): 95-101. [7] FILIPPIDOU F, MOUSSIADES L. Α benchmarking of IBM, Google and Wit automatic speech recognition systems[C]//Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations. Cham: Springer, 2020: 73-82. [8] DE RUSSIS L, CORNO F. On the impact of dysarthric speech on contemporary ASR cloud platforms[J]. Journal of Reliable Intelligent Environments, 2019, 5(3): 163-172. [9] GREEN J R, MACDONALD R L, JIANG P P, et al. Automatic speech recognition of disordered speech: personalized models outperforming human listeners on short phrases[C]//Proceedings of the Interspeech 2021, 2021: 4778-4782 [10] 刘伟, 谢建志. 语音合成系统中语音库样本能量均衡方法研究[J]. 信号处理, 2017, 33(2): 229-235. LIU W, XIE J Z. Voice energy balance method for text to speech database[J]. Journal of Signal Processing, 2017, 33(2): 229-235. [11] DELLER J R, LIU M S, FERRIER L J, et al. The Whitaker database of dysarthric (cerebral palsy) speech[J]. The Journal of the Acoustical Society of America, 1993, 93(6): 3516-3518. [12] MENENDEZ-PIDAL X, POLIKOFF J B, PETERS S M, et al. The Nemours database of dysarthric speech[C]//Proceedings of the 4th International Conference on Spoken Language Processing, 2002: 1962-1965. [13] RUDZICZ F, NAMASIVAYAM A K, WOLFF T. The Torgo database of acoustic and articulatory speech from speakers with dysarthria[J]. Language Resources and Evaluation, 2012, 46(4): 523-541. [14] KIM H, HASEGAWA-JOHNSON M, PERLMAN A, et al. Dysarthric speech database for universal access research[C]//Proceedings of the Interspeech 2008, 2008: 1741-1744. [15] NICOLAO M, CHRISTENSEN H, CUNNINGHAM S, et al. A framework for collecting realistic recordings of dysarthric speech-the homeservice corpus[C]//Proceedings of the 10th International Conference on Language Resources and Evaluation, 2016: 1993-1997. [16] 唐以廷. 汉语普通话失语症患者的特定字识别研究[D]. 汕头: 汕头大学, 2020. TANG Y T. A study on the recognition of specific words in patients with aphasia in Mandarin Chinese[D]. Shantou: Shantou University, 2020. [17] MARINI M, VIGANò M, CORBO M, et al. IDEA: an Italian dysarthric speech database[C]//Proceedings of the 2021 IEEE Spoken Language Technology Workshop (SLT), 2021: 1086-1093. [18] TURRISI R, BRACCIA A, EMANUELE M, et al. EasyCall corpus: a dysarthric speech dataset[J]. arXiv:2104.02542, 2021. [19] MACDONALD R L, JIANG P P, CATTIAU J, et al. Disordered speech data collection: lessons learned at 1 million utterances from project euphonia[C]//Proceedings of the Interspeech 2021, 2021: 4833-4837. [20] MARIYA CELIN T A, NAGARAJAN T, VIJAYALAKSHMI P. Dysarthric speech corpus in Tamil for rehabilitation research[C]//Proceedings of the 2016 IEEE Region 10 Conference (TENCON), 2017: 2610-2613. [21] STYLER W. Using Praat for linguistic research[D]. Colorado, America: University of Colorado at Boulder Phonetics Lab, 2013. [22] XIONG F F, BARKER J, CHRISTENSEN H. Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition[C]//Proceedings of the ITG Symposium on Speech Communication, 2018: 1-5. [23] POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi speech recognition toolkit[C]//Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, 2011. [24] XIONG F F, BARKER J, YUE Z J, et al. Source domain data selection for improved transfer learning targeting dysarthric speech recognition[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020: 7424-7428. [25] POVEY D, CHENG G F, WANG Y M, et al. Semi-orthogonal low-rank matrix factorization for deep neural networks[C]//Proceedings of the Interspeech 2018. , 2018: 3743-3747. [26] POVEY D, PEDDINTI V, GALVEZ D, et al. Purely sequence-trained neural networks for ASR based on lattice-free MMI[C]//Proceedings of the Interspeech 2016, 2016: 2751-2755. [27] PARK D S, CHAN W, ZHANG Y, et al. SpecAugment: a simple data augmentation method for automatic speech recognition[C]//Proceedings of the Interspeech 2019, 2019: 2613-2617. [28] YU J J, XIE X R, LIU S S, et al. Development of the CUHK dysarthric speech recognition system for the UA speech corpus[C]//Proceedings of the Interspeech 2018, 2018: 2938-2942. [29] ELSKEN T, METZEN J H, HUTTER F. Neural architecture search[M]//Automated machine learning. Cham: Springer, 2019: 63-77. [30] LIU S S, GENG M Z, HU S K, et al. Recent progress in the CUHK dysarthric speech recognition system[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 2267-2281. [31] LIU H X, SIMONYAN K, YANG Y M. DARTS: differentiable architecture search[J]. arXiv:1806.09055, 2018. [32] AFOURAS T, CHUNG J S, SENIOR A, et al. Deep audio-visual speech recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 8717-8727. [33] 段淑斐, 王俊芹, DINGAM Camille, 等. 基于发音空间特征的构音障碍患者的病情分级[J]. 复旦学报 (自然科学版), 2021, 60(3): 288-296. DUAN S F, WANG J Q, CAMILLE D, et al. Disease degree classification of dysarthria based on spatial features of articulation[J]. Journal of Fudan University (Natural Science), 2021, 60(3): 288-296. [34] FRANK R, GRAEME H, PASCAL V L. Vocal tract representation in the recognition of cerebral palsied speech[J]. Journal of Speech, Language, and Hearing Research, 2012, 55(4): 1190-207. [35] SHI B W, HSU W N, LAKHOTIA K, et al. Learning audio-visual speech representation by masked multimodal cluster prediction[J]. arXiv:2201.02184, 2022. [36] HU S J, LIU S S, XIE X R, et al. Exploiting cross domain acoustic-to-articulatory inverted features for disordered speech recognition[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022: 6747-6751. [37] YUE Zhengjun, LOWEIMI E, CVETKOVIC Z, et al. Multi-modal acoustic-articulatory feature fusion for dysarthric speech recognition[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022: 7372-7376. [38] KRISHNA G, CARNAHAN M, SHAMAPANT S, et al. Brain signals to rescue aphasia, apraxia and dysarthria speech recognition[C]//Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2021: 6008-6014. [39] CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. arXiv:1412.3555, 2014. [40] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958. [41] JACKS A, HALEY K, BISHOP G, et al. Automated speech recognition in adult stroke survivors: comparing human and computer transcriptions[J]. Folia Phoniatrica et Logopaedica, 2019, 71(5/6): 286-296. [42] 李仕萍, 凌卫新, 陈卓铭, 等. 语言障碍诊断系统的设计及实现[J]. 计算机工程与应用, 2004, 40(30): 191-193. LI S P, LING W X, CHEN Z M, et al. Design and implementation of the system of languages barrier diagnoses[J]. Computer Engineering and Applications, 2004, 40(30): 191-193. [43] ENDERBY P. Frenchay dysarthria assessment[J]. International Journal of Language & Communication Disorders, 1980, 15(3): 165-173. [44] GHIO A, POUCHOULIN G, TESTON B, et al. How to manage sound, physiological and clinical data of 2500 dysphonic and dysarthric speakers?[J]. Speech Communication, 2012, 54(5): 664-679. [45] HAN M, CHEN F, NI Z, et al. ViLaS: integrating vision and language into automatic speech recognition[J]. arXiv:2305.19972, 2023. [46] HE Y, SENG K P, ANG L M. Multimodal sensor-input architecture with deep learning for audio-visual speech recognition in wild[J]. Sensors, 2023, 23(4): 1834. [47] 徐静. 基于声学特征探讨低动力型构音障碍帕金森病的量化评估方法[D]. 广州: 暨南大学, 2020. XU J. A quantitative evaluation method of Parkinson’s disease with hypodynamic dysarthria based on speech characteristics[D]. Guangzhou: Jinan University, 2020. [48] MIN Z, WANG J. Exploring the integration of large language models into automatic speech recognition systems: an empirical study[J]. arXiv:2307.06530, 2023. [49] 马英杰, 陈骥, 帅杰. 基于语音识别的失语症康复治疗仪软件设计与实现[J]. 生物医学工程学杂志, 2006, 23(6): 1343-1346. MA Y J, CHEN J, SHUAI J. Design and implementation of aphasia rehabilitation software based on speech recognition[J]. Journal of Biomedical Engineering, 2006, 23(6): 1343-1346. |
[1] | WANG Cailing, YAN Jingjing, ZHANG Zhidong. Review on Human Action Recognition Methods Based on Multimodal Data [J]. Computer Engineering and Applications, 2024, 60(9): 1-18. |
[2] | LIAN Lu, TIAN Qichuan, TAN Run, ZHANG Xiaohang. Research Progress of Image Style Transfer Based on Neural Network [J]. Computer Engineering and Applications, 2024, 60(9): 30-47. |
[3] | YANG Chenxi, ZHUANG Xufei, CHEN Junnan, LI Heng. Review of Research on Bus Travel Trajectory Prediction Based on Deep Learning [J]. Computer Engineering and Applications, 2024, 60(9): 65-78. |
[4] | SONG Jianping, WANG Yi, SUN Kaiwei, LIU Qilie. Short Text Classification Combined with Hyperbolic Graph Attention Networks and Labels [J]. Computer Engineering and Applications, 2024, 60(9): 188-195. |
[5] | CHE Yunlong, YUAN Liang, SUN Lihui. 3D Object Detection Based on Strong Semantic Key Point Sampling [J]. Computer Engineering and Applications, 2024, 60(9): 254-260. |
[6] | QIU Yunfei, WANG Yifan. Multi-Level 3D Point Cloud Completion with Dual-Branch Structure [J]. Computer Engineering and Applications, 2024, 60(9): 272-282. |
[7] | YE Bin, ZHU Xingshuai, YAO Kang, DING Shangshang, FU Weiwei. Binocular Depth Measurement Method for Desktop Interaction Scene [J]. Computer Engineering and Applications, 2024, 60(9): 283-291. |
[8] | ZHOU Bojun, CHEN Zhiyu. Survey of Few-Shot Image Classification Based on Deep Meta-Learning [J]. Computer Engineering and Applications, 2024, 60(8): 1-15. |
[9] | SUN Shilei, LI Ming, LIU Jing, MA Jingang, CHEN Tianzhen. Research Progress on Deep Learning in Field of Diabetic Retinopathy Classification [J]. Computer Engineering and Applications, 2024, 60(8): 16-30. |
[10] | WANG Weitai, WANG Xiaoqiang, LI Leixiao, TAO Yihao, LIN Hao. Review of Construction and Applications of Spatio-Temporal Graph Neural Network in Traffic Flow Prediction [J]. Computer Engineering and Applications, 2024, 60(8): 31-45. |
[11] | XIE Weiyu, ZHANG Qiang. Review on Detection of Drones and Birds in Photoelectric Images Based on Deep Learning Convolutional Neural Network [J]. Computer Engineering and Applications, 2024, 60(8): 46-55. |
[12] | ZHOU Dingwei, HU Jing, ZHANG Liangrui, DUAN Feiya. Collaborative Correction Technology of Label Omission in Dataset for Object Detection [J]. Computer Engineering and Applications, 2024, 60(8): 267-273. |
[13] | CHANG Xilong, LIANG Kun, LI Wentao. Review of Development of Deep Learning Optimizer [J]. Computer Engineering and Applications, 2024, 60(7): 1-12. |
[14] | ZHOU Yutong, MA Zhiqiang, XU Biqi, JIA Wenchao, LYU Kai, LIU Jia. Survey of Deep Learning-Based on Emotion Generation in Conversation [J]. Computer Engineering and Applications, 2024, 60(7): 13-25. |
[15] | JIANG Liang, ZHANG Cheng, WEI Dejian, CAO Hui, DU Yuzheng. Deep Learning in Aided Diagnosis of Osteoporosis [J]. Computer Engineering and Applications, 2024, 60(7): 26-40. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||