计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (21): 152-156.

• 模式识别与人工智能 • 上一篇    下一篇

面向语音情感识别的语谱特征提取算法研究

唐闺臣1,冯月芹1,梁瑞宇1,2,包永强1,赵  力2   

  1. 1.南京工程学院 通信工程学院,南京 211167
    2.东南大学 信息科学工程学院,南京 210096
  • 出版日期:2016-11-01 发布日期:2016-11-17

Research on algorithm of spectral feature extraction for speech emotion recognition

TANG Guichen1, FENG Yueqin1, LIANG Ruiyu1,2, BAO Yongqiang1, ZHAO Li2   

  1. 1.School of Communication Engineering, Nanjing Institute of Technology, Nanjing 211167, China
    2.School of Information Science and Engineering, Southeast University, Nanjing 210096, China
  • Online:2016-11-01 Published:2016-11-17

摘要: 语音情感识别的精度很大程度上取决于不同情感间的特征差异性。从分析语音的时频特性入手,结合人类的听觉选择性注意机制,提出一种基于语谱特征的语音情感识别算法。算法首先模拟人耳的听觉选择性注意机制,对情感语谱信号进行时域和频域上的分割提取,从而形成语音情感显著图。然后,基于显著图,提出采用Hu不变矩特征、纹理特征和部分语谱特征作为情感识别的主要特征。最后,基于支持向量机算法对语音情感进行识别。在语音情感数据库上的识别实验显示,提出的算法具有较高的语音情感识别率和鲁棒性,尤其对于实用的烦躁情感的识别最为明显。此外,不同情感特征间的主向量分析显示,所选情感特征间的差异性大,实用性强。

关键词: 语音情感识别, 听觉选择性注意, 语谱, 支持向量机

Abstract: The speech emotion recognition rate largely depends on the characteristic differences between different emotions. Through the analysis of time-frequency characteristics of speech and the simulation of the auditory selective attention mechanism, a speech emotion recognition algorithm is proposed based on the spectral feature. Firstly, based on the auditory selective attention mechanism, the speech signal is segmented, and the emotional saliency map is extracted from the time-frequency domain analysis of the segmented speech. Secondly, based on the saliency map, HU moment invariants features, texture features and some spectral features are used as the main features of speech emotion recognition. Finally, the speech emotion is recognized by the support vector machine. From the recognition results of emotional speech database, the proposed algorithm has higher speech emotion recognition rate and robustness, especially for the identification of practical irritable emotion. In addition, results of principal component analysis show that the characteristic differences between the selected emotions are more obvious and the algorithm is more practical.

Key words: speech emotion recognition, auditory selective attention, speech spectrum, support vector machine