计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (10): 127-134.DOI: 10.3778/j.issn.1002-8331.1803-0060

• 模式识别与人工智能 • 上一篇    下一篇

结合听觉模型的腭裂语音高鼻音等级自动识别

付方玲1,何  飞1,付  佳1,尹  恒2,黄  华1,何  凌1   

  1. 1.四川大学 电气信息学院,成都 610065
    2.四川大学 华西口腔医院,成都 610041
  • 出版日期:2019-05-15 发布日期:2019-05-13

Automatic Detection of Hypernasality Degrees in Cleft Palate Speech Based on Human Auditory Model

FU Fangling1, HE Fei1, FU Jia1, YIN Heng2, HUANG Hua1, HE Ling1   

  1. 1.College of Electrical Engineering and Information Technology, Sichuan University, Chengdu 610065, China
    2.West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China
  • Online:2019-05-15 Published:2019-05-13

摘要: 腭裂语音高鼻音等级的自动识别能为临床腭咽功能评估提供有效、客观、无创的辅助依据。对腭裂语音高鼻音等级自动分类系统进行了研究,利用听觉模型提取语音信号的听觉内部表达,并结合同步检测器提取软限制比(Soft Limited Ratio,SLR)谱特征作为特征参数,利用一对一支持向量机(1-v-1 Support Vector Machine,1-v-1 SVM)实现腭裂语音高鼻音四类等级(正常、轻度、中度和重度)的自动划分。实验采用56名儿童的共3 086个语音样本,并对比了使用不同基底膜滤波器种类和个数,使用同步检测器和侧抑制网络对识别效果的影响。实验结果表明,使用基于等效矩阵带宽(Equivalent Rectangular Bandwidth,ERB)尺度的Gammatone滤波器的识别效果优于基于Bark尺度的小波包滤波器;54个通道的滤波器能有效权衡算法时间成本和识别正确率;使用同步检测器提取SLR谱特征的识别效果优于侧抑制网络提取的LIN(Lateral Inhibition Network)谱特征。腭裂语音高鼻音四类等级自动识别系统最高分类正确率达91.50%。

关键词: 腭裂语音, 高鼻音, 听觉模型, 同步检测器

Abstract: The automatic detection of hypernasality degrees in cleft palate speech can provide effective, objective and non-invasive basis for the assessment of velopharyngeal function in clinical. In this work, an automatic detection system of hypernasality degrees in cleft palate has been researched. The human auditory model is applied to extract the inner presentation of speech signal as the front-end processing, and the SLR(Soft-Limited Ratio) spectral features extracted from the synchronous detector is used as the acoustic characteristic parameters. The 1-v-1 SVM (1-v-1 Support Vector Machine) is utilized to automatically detect the hypernasality degrees (normal, mild, moderate and severe hypernasality). Experimental data include total 3 086 speeches from 56 kids, the comparisons of filter bank’s kind and number, synchronous detector and lateral inhibitory network are discussed. And the results show that the Gammatone filter based on ERB (Equivalent Rectangular Bandwidth) scale performs better than the wavelet-packet filter based on Bark scale, and the filter bank with 54 channels can effectively weigh the time cost and recognition accuracy of our algorithm, and SLR spectral features extracted from the synchronous detector has better recognition than LIN spectral features extracted from the lateral inhibition network. The highest accuracy of the automatic detection of four-hypernasality degree is 91.50%.

Key words: cleft palate speech, hypernasality, auditory model, synchronous detector