计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (11): 62-74.DOI: 10.3778/j.issn.1002-8331.2309-0154

• 热点与综述 • 上一篇    下一篇

构音障碍语音识别算法研究综述

宋伟,张杨豪   

  1. 1.中央民族大学 信息工程学院,北京 100081
    2.国家语言资源监测与研究少数民族语言中心,北京 100081
    3.民族语言智能分析与安全治理教育部重点实验室,北京 100081
  • 出版日期:2024-06-01 发布日期:2024-05-31

Survey of Specific Speech Recognition Algorithms for Dysarthria

SONG Wei, ZHANG Yanghao   

  1. 1.School of Information Engineering, Minzu University of China, Beijing 100081, China
    2.National Language Resource Monitoring and Research Center for Minority Languages, Beijing 100081, China
    3.Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Beijing 100081, China
  • Online:2024-06-01 Published:2024-05-31

摘要: 构音障碍作为一种医学难症,目前主流的语音识别技术并不能很好地适应这一领域的需求。同时针对构音障碍的语音识别技术利用预训练及个性化训练相结合的方式,通过数据驱动进一步提升了算法性能,识别字错误率进一步降低,但是目前针对构音障碍的语音识别技术离实际商用还存在一定的距离,该技术的发展受数据规模和技术的限制。到目前为止,尚未出现针对构音障碍语音识别方面的综述文章,亟需将该领域中各种数据集的构建方法和先进技术进行对比分析,以方便进入该领域的研究人员快速获取这方面的知识。对现有数据集、主流算法、评估方式进行了调研,总结了国内外主流构音障碍数据集的规模、形式和特点。分析了构音障碍语音识别的主流算法,并给出了不同算法的性能和特点。最后,研究了基于构音障碍患者的严重等级的算法模型性能评价指标,并讨论了未来的研究方向,以期能够为从事构音障碍语音识别的研究人员提供帮助,助力该领域的快速发展。

关键词: 构音障碍, 语音识别, 深度学习, 人工智能

Abstract: Articulation disorder, as a medical difficulty, currently mainstream speech recognition technologies are not well adapted to the needs of this field. At the same time, speech recognition technology for dysarthria utilizes a combination of pre training and personalized training to further improve algorithm performance and reduce recognition word error rate through data-driven methods. However, currently, speech recognition technology for dysarthria still has a certain distance from practical commercial use, and its development is limited by data scale and technology. So far, there have been no comprehensive articles on speech recognition for dysarthria. It is urgent to compare and analyze the construction methods and advanced technologies of various datasets in this field, in order to facilitate researchers entering the field to quickly acquire knowledge in this field. This paper conducts a survey on existing datasets, mainstream algorithms, and evaluation methods, and summarizes the scale, form, and characteristics of mainstream speech impairment datasets at home and abroad. It analyzes the mainstream algorithms for speech recognition with dysarthria, and provides the performance and characteristics of different algorithms. Finally, the performance evaluation indicators of the algorithm model based on the severity level of patients with dysarthria are studied, and future research directions are discussed, in order to provide help for the researchers engaged in speech recognition with dysarthria and assist in the rapid development of this field.

Key words: dysarthria, speech recognition, deep learning, artificial intelligence