[1] HAN K,YU D,TASHEV I.Speech emotion recognition using deep neural network and extreme learning machine[C]//15th Annual Conference of the International Speech Communication Association,2014:223-227.
[2] LEE J,TASHEV I.High-level feature representation using recurrent neural network for speech emotion recognition[C]//16th Annual Conference of the International Speech Communication Association,2015:1-4.
[3] NEUMANN M,VU N T.Attentive convolutional neural network-based speech emotion recognition:a study on the impact of input features signal length,and acted speech[C]//18th Annual Conference of the International Speech Communication Association,2017:1263-1267.
[4] TASHEV I J,WANG Z Q,GODIN K.Speech emotion recognition based on Gaussian mixture models and deep neural networks[C]//2017 Information Theory and Applications Workshop,2017:1-4.
[5] MUSTAQEEM Y,SAJJAD M,KWON S.Clustering-based speech emotion recognition by incorporating learned features and deep Bi-LSTM[J].IEEE Access,2020,8:79861-79875.
[6] ZADEH A,CHEN M,PORIA S,et al.Tensor fusion network for multimodal sentiment analysis[C]//2017 Conference on Empirical Methods in Natural Language Processing,2017:1103-1114.
[7] JIN Q,LI C,CHEN S,et al.Speech emotion recognition with acoustic and lexical features[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing,2015:4749-4753.
[8] SAHAY S,KUMAR S H,XIA R,et al.Multimodal relational tensor network for sentiment and emotion classification[C]//Grand Challenge & Workshop on Human Multimodal Language,2018.
[9] AKHTAR M S,CHAUHAN D S,GHOSAL D,et al.Multi-task learning for multi-modal emotion recognition and sentiment analysis[C]//2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2019:370-379.
[10] ZHANG B,KHURRAM S,PROVOST E M.Exploiting acoustic and lexical properties of phonemes to recognize valence from speech[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing,2019:5871-5875.
[11] PORIA S,MAJUMDER N,HAZARIKA D,et al.Multimodal sentiment analysis:addressing key issues and setting up the baselines[J].IEEE Intelligent Systems,2018,33(6):17-25.
[12] GAMAGE K W,SETHU V,AMBIKAIRAJAH E.Salience based lexical features for emotion recognition[C]//2017 IEEE International Conference on Acoustics,Speech and Signal Processing,2017:5830-5834.
[13] SEBASTIAN J,PIERUCCI P.Fusion techniques for utterance-level emotion recognition combining speech and transcripts[C]//20th Annual Conference of the International Speech Communication Association,2019:51-55.
[14] PEPINO L,RIERA P,FERRER L,et al.Fusion approaches for emotion recognition from speech using acoustic and text-based features[C]//2020 IEEE International Conference on Acoustics,Speech and Signal Processing,2020:6484-6488.
[15] YOON S,BYUN S,JUNG K.Multimodal speech emotion recognition using audio and text[C]//2018 IEEE Spoken Language Technology Workshop,Athens,2018:112-118.