XU Zhijing, LIU Xia. Bimodal Emotion Recognition Model Based on Cascaded Two Channel Phased Fusion[J]. Computer Engineering and Applications, 2023, 59(8): 127-137.
[1] HAN W,RUAN H,CHEN X,et al.Towards temporal modelling of categorical speech emotion recognition[C]//Proc Interspeech,2018:932-936.
[2] EYBEN F,WENINGER F,GROSS F,et al.Recent developments in openSMILE,the munich open-source multimedia feature extractor[C]//Proceedings of the 21st ACM International Conference on Multimedia,2013:835-838.
[3] HAN K,YU D,TASHEV I.Speech emotion recognition using deep neural network and extreme learning machine[C]//Proc Interspeech,2014:223-227.
[4] TRIGEORGIS G,RINGEVAL F,BRUECKNER R,et al.Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing,2016:5200-5204.
[5] YANG C,YIH LIN K H,CHEN H H.Emotion classification using Web blog corpora[C]//IEEE/WIC/ACM International Conference on Web Intelligence,2007:275-278.
[6] PENNINGTON J,SOCHER R,MANNING C.Glove:global vectors for word representation[C]//Proc Conf Empirical Methods Natural Lang Process(EMNLP),2014:1532-1543.
[7] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems,2013:3111-3119.
[8] PORIA S,CHATURVEDI I.Convolutional MKL based multimodal emotion recognition and sentiment analysis[C]//2016 IEEE 16th International Conference on Data Mining,2016.
[9] JIAO W,YANG H,KING I,et al.HiGRU:hierarchical gated recurrent units for utterance-level emotion recognition[C]//NAACL,2019.
[10] YOON S,BYUN S,JUNG K.Multimodal speech emotion recognition using audio and text[C]//2018 IEEE Spoken Language Technology Workshop(SLT),2018:112-118.
[11] TRIPATHI S,KUMAR A,RAMESH A,et al.Deep learning based emotion recognition system using speech features and transcriptions[J].arXiv:1906.05681v1,2019.
[12] 徐志京,高姗.基于Transformer-ESIM注意力机制的多模态情绪识别[J].计算机工程与应用,2022,58(10):132-138.
XU Zhijing,GAO Shan.Multi-modal emotion recognition based on Transformer-ESIM attention mechanism[J].Computer Engineering and Applications,2022,58(10):132-138.
[13] XU Haiyang,ZHANG Hui,HAN Kun,et al.Learning alignment for multimodal emotionrecognition from speech[C]//Proc Interspeech,2019:3569-3573.
[14] YOON S,BYUN S,DEY S,et al.Speech emotion recognition using multi-hop attention mechanism[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2019:2822-2826.
[15] SIRIWARDHANA S,KALUARACHCHI T,BILLINGHURST M,et al.Multimodal emotion recognition with transformer-based self supervised feature fusion[J].IEEE Access,2020,8:176274-176285.
[16] SUN L,LIU B,TAO J,et al.Multimodal cross- and self-attention network for speech emotion recognition[C]//2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2021:4275-4279.
[17] 王兰馨,王卫亚,程鑫.结合Bi-LSTM-CNN的语音文本双模态情感识别模型[J].计算机工程与应用,2022,58(4):192-197.
WANG Lanxin,WANG Weiya,CHENG Xin.Bimodal emotion recognition model for speech-text based on Bi-LSTM-CNN[J].Computer Engineering and Applications,2022,58(4):192-197.
[18] BUSSO C,BULUT M,LEE C C,et al.IEMOCAP:interactive emotional dyadic motion capture database[J].Journal of Lang Resources & Evaluation,2008,42:335-359.
[19] SINGH P,SRIVASTAVA R,RANA K P S.A multimodal hierarchical approach to speech emotion recognition from audio and text[J].Knowledge-Based Sytems,2021,229:107316.
[20] DENG J J,LEUNG C H C,LI Y.Multimodal emotion recognition using transfer learning on audio and text data[C]//Computational Science and Its Applications-ICCSA,2021:552-563.
[21] LEE Y,YOON S,JUNG K.Multimodal speech emotion recognition using cross attention with aligned audio and text[C]//Proc Interspeech,2020:2717-2721.
[22] LEE S,HAN D K,KO H.Multimodal emotion recognition fusion analysis adapting BERT with heterogeneous feature unification[J].IEEE Access,2021,9:94557-94572.
[23] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[C]//NAACL-HLT(1),2019.
[24] PORIA S,MAJUMDER N,HAZARIKA D,et al.Multimodal sentiment analysis:addressing key issues and setting up the baselines[J].IEEE Intelligent Systems,2018,33(6):17-25.
[25] EYBEN F,W?LLMER M,SCHULLER B.Opensmile:the munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM International Conference on Multimedia.Association for Computing Machinery,New York,NY,USA,2010:1459-1462.
[26] PEPINO L,RIERA P,FERRER L,et al.Fusion approaches for emotion recognition from speech using acoustic and text-based features[C]//2020 IEEE International Conference on Acoustics, Speech and Signal Processing,2020:6484-6488.
[27] HAZARIKA D,PORIA S,MIHALCEA R,et al.Icon:interactive conversational memory network for multimodal emotion detection[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,2018:2594-2604.
[28] HAZARIKA D,PORIA S,ZADEH A,et al.Conversational memory network for emotion recognition in dyadic dialogue videos[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long Papers),2018:2122-2132.