计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (21): 141-147.DOI: 10.3778/j.issn.1002-8331.1707-0015

• 模式识别与人工智能 • 上一篇    下一篇

基于声道特性的腭裂语音高鼻音等级自动识别

唐  铭1,何岩萍2,尹  恒3,刘  奇1,何  凌1   

  1. 1.四川大学 电气信息学院,成都 610065
    2.四川大学 材料科学与工程学院,成都 610064
    3.四川大学 华西口腔医院,成都 610041
  • 出版日期:2018-11-01 发布日期:2018-10-30

Hypernasality detection in cleft palate speech based on vocal tract characteristics

TANG Ming1, HE Yanping2, YIN Heng3, LIU Qi1, HE Ling1   

  1. 1.College of Electrical Engineering and Information Technology, Sichuan University, Chengdu 610065, China
    2.College of Materials Science and Engineering, Sichuan University, Chengdu 610064, China
    3.West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China
  • Online:2018-11-01 Published:2018-10-30

摘要: 腭裂语音高鼻音等级的自动识别对于腭咽功能的评估具有重要临床价值。对腭裂语音高鼻音等级自动识别算法进行了研究,提出基于声道特性的腭裂语音高鼻音等级自动识别算法。利用高低阶线性预测倒谱系数(Linear Prediction Cepstrum Coefficient,LPCC)与倒谱系数结合成为LPCC-Cep特征组作为声学特征参数,采用稀疏表示分类器(Sparse Representation based Classification,SRC)实现腭裂语音四类高鼻音等级(正常、轻度、中度和重度)的自动识别。实验结果表明,提出的自动识别算法取得了较高的高鼻音类别正确识别率。其中,LPCC-Cep特征组参数对高鼻音等级的正确识别率为83.38%。

关键词: 腭裂语音, 高鼻音, 线性预测, 稀疏表示, 倒谱

Abstract: The automatic hypernasality detection in cleft palate speech has important clinical value. The detected hypernasality grades provide critical information of assessment for velopharyngeal function. In this work, an automatic detection algorithm in cleft palate speech based on the vocal tract characteristics is proposed. Combining the high order and low order linear prediction cepstrum coefficient with the cepstrum coefficient, a feature set called LPCC-Cep is obtained. Using the proposed feature set as the acoustic characteristic parameter, a sparse representation classifier is applied to achieve the automatic four-grade hypernasality detection:normal, mild, moderate and severe hypernasality. The experimental results show that the proposed algorithm reaches high accuracy of  hypernasality detection. Among them, selecting LPCC-Cep feature set as the parameter, the accuracy of hypernasality detection is 83.38%.

Key words: cleft palate speech, hypernasality, linear prediction, sparse representation, cepstrum