结合听觉模型的腭裂语音高鼻音等级自动识别

doi:10.3778/j.issn.1002-8331.1803-0060

计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (10): 127-134.DOI: 10.3778/j.issn.1002-8331.1803-0060

结合听觉模型的腭裂语音高鼻音等级自动识别

付方玲1，何飞1，付佳1，尹恒2，黄华1，何凌1

1.四川大学电气信息学院，成都 610065
2.四川大学华西口腔医院，成都 610041

出版日期:2019-05-15 发布日期:2019-05-13

Automatic Detection of Hypernasality Degrees in Cleft Palate Speech Based on Human Auditory Model

FU Fangling1, HE Fei1, FU Jia1, YIN Heng2, HUANG Hua1, HE Ling1

1.College of Electrical Engineering and Information Technology, Sichuan University, Chengdu 610065, China
2.West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China

Online:2019-05-15 Published:2019-05-13

摘要/Abstract

摘要： 腭裂语音高鼻音等级的自动识别能为临床腭咽功能评估提供有效、客观、无创的辅助依据。对腭裂语音高鼻音等级自动分类系统进行了研究，利用听觉模型提取语音信号的听觉内部表达，并结合同步检测器提取软限制比（Soft Limited Ratio，SLR）谱特征作为特征参数，利用一对一支持向量机（1-v-1 Support Vector Machine，1-v-1 SVM）实现腭裂语音高鼻音四类等级（正常、轻度、中度和重度）的自动划分。实验采用56名儿童的共3 086个语音样本，并对比了使用不同基底膜滤波器种类和个数，使用同步检测器和侧抑制网络对识别效果的影响。实验结果表明，使用基于等效矩阵带宽（Equivalent Rectangular Bandwidth，ERB）尺度的Gammatone滤波器的识别效果优于基于Bark尺度的小波包滤波器；54个通道的滤波器能有效权衡算法时间成本和识别正确率；使用同步检测器提取SLR谱特征的识别效果优于侧抑制网络提取的LIN（Lateral Inhibition Network）谱特征。腭裂语音高鼻音四类等级自动识别系统最高分类正确率达91.50%。

关键词: 腭裂语音, 高鼻音, 听觉模型, 同步检测器

Abstract: The automatic detection of hypernasality degrees in cleft palate speech can provide effective, objective and non-invasive basis for the assessment of velopharyngeal function in clinical. In this work, an automatic detection system of hypernasality degrees in cleft palate has been researched. The human auditory model is applied to extract the inner presentation of speech signal as the front-end processing, and the SLR（Soft-Limited Ratio） spectral features extracted from the synchronous detector is used as the acoustic characteristic parameters. The 1-v-1 SVM （1-v-1 Support Vector Machine） is utilized to automatically detect the hypernasality degrees （normal, mild, moderate and severe hypernasality）. Experimental data include total 3 086 speeches from 56 kids, the comparisons of filter bank’s kind and number, synchronous detector and lateral inhibitory network are discussed. And the results show that the Gammatone filter based on ERB （Equivalent Rectangular Bandwidth） scale performs better than the wavelet-packet filter based on Bark scale, and the filter bank with 54 channels can effectively weigh the time cost and recognition accuracy of our algorithm, and SLR spectral features extracted from the synchronous detector has better recognition than LIN spectral features extracted from the lateral inhibition network. The highest accuracy of the automatic detection of four-hypernasality degree is 91.50%.

Key words: cleft palate speech, hypernasality, auditory model, synchronous detector

付方玲1，何飞1，付佳1，尹恒2，黄华1，何凌1. 结合听觉模型的腭裂语音高鼻音等级自动识别[J]. 计算机工程与应用, 2019, 55(10): 127-134.

FU Fangling1, HE Fei1, FU Jia1, YIN Heng2, HUANG Hua1, HE Ling1. Automatic Detection of Hypernasality Degrees in Cleft Palate Speech Based on Human Auditory Model[J]. Computer Engineering and Applications, 2019, 55(10): 127-134.

[1]	付佳，田婷，唐铭，何凌，尹恒. 结合PECGTFs和SSMC的腭裂语音咽擦音自动检测算法[J]. 计算机工程与应用, 2019, 55(24): 102-109.
[2]	王熙月1，黄毅鹏1，钱佳慧1，何凌1，黄华1，尹恒2. 基于声学特征的腭裂语音声韵母切分[J]. 计算机工程与应用, 2018, 54(8): 123-130.
[3]	唐铭1，何岩萍2，尹恒3，刘奇1，何凌1. 基于声道特性的腭裂语音高鼻音等级自动识别[J]. 计算机工程与应用, 2018, 54(21): 141-147.
[4]	何映仙,许刚. 一种基于语音场景分析的听觉模型[J]. 计算机工程与应用, 2007, 43(29): 228-231.

结合听觉模型的腭裂语音高鼻音等级自动识别

Automatic Detection of Hypernasality Degrees in Cleft Palate Speech Based on Human Auditory Model

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

编辑推荐

Metrics