计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (20): 128-133.DOI: 10.3778/j.issn.1002-8331.1705-0377

• 模式识别与人工智能 • 上一篇    下一篇

面向情感语音识别的非线性几何特征提取算法

宋春晓,孙  颖   

  1. 太原理工大学 信息工程学院,太原 030024
  • 出版日期:2017-10-15 发布日期:2017-10-31

Nonlinear geometric feature extraction algorithm for emotional speech recognition

SONG Chunxiao, SUN Ying   

  1. College of Information Engineering, Taiyuan University of Technology, Taiyuan 030024, China
  • Online:2017-10-15 Published:2017-10-31

摘要: 针对现有时域、频域属性特征在区分情感状态上存在的局限性,提出一种基于相空间重构理论的非线性几何特征提取方法。首先,通过分析情感语音信号的最小延迟时间和嵌入维数来实现相空间重构;其次,在重构相空间下分析并提取基于轨迹描述轮廓的五种非线性几何特征;最后,结合韵律特征、MFCC特征和混沌特征,设计实验方案验证所提特征区分情感状态的能力并通过特征选择获得情感信息完整的最优特征集合。选用德语柏林语音库中的五种情感(高兴、悲伤、中性、愤怒、害怕)作为实验数据来源,支持向量机作为识别网络。实验结果表明:与韵律特征、MFCC特征和混沌特征相比,所提特征不仅可以有效地表征语音信号中的情感差异性,也能够弥补现有特征在刻画情感状态上的不足。

关键词: 相空间重构, 情感语音识别, 非线性几何特征, 特征选择, 最优特征集合

Abstract: Aiming at addressing the limitations of the existing time domain and frequency domain attribute characteristics in distinguishing the emotional state, a nonlinear geometric feature extraction method based on phase space reconstruction theory is proposed. Firstly, the phase space is reconstructed by analyzing the minimum delay time and the embedded dimension of the emotion speech signal. Secondly, five kinds of nonlinear geometric features based on trajectory contour are analyzed and extracted under reconstructed phase space. Finally, combining with the prosody features, MFCC features and chaotic characteristics, the experiments are designed to verify the ability of the proposed feature to distinguish the emotional state and to obtain the complete set of optimal features of the emotional information through the feature selection. Five kinds of emotions (happy, sad, neutral, angry and fear) in the German speech library are selected as the experimental data source, and the support vector machine is used as the identification network. The experimental results show that the proposed feature can not only characterize the emotion difference in the speech signal, but also make up the deficiency of the existing feature in characterizing the emotional state compared with the prosody feature, the MFCC feature and the chaotic feature.

Key words: phase space reconstruction, emotional speech recognition, nonlinear geometric features, feature selection, optimal feature set