[1] DELLAERT F, POLZIN T, WAIBEL A. Recognizing emotion in speech[C]//Proceedings of the Fourth International Conference on Spoken Language, 1996: 1970-1973.
[2] SCHULLER B W. Speech emotion recognition two decades in a nutshell, benchmarks, and ongoing trends[J]. Communications of the ACM, 2018, 61(5): 90-99.
[3] SCHULLER B, VALSTAR M, COWIE R, et al. AVEC 2012—the continuous audio/visual emotion challenge[C]//Proceedings of the 2nd International Audio/Visual Emotion Challenge and Workshop, 2012: 449-456.
[4] TRIPATHI S, KUMAR A, RAMESH A, et al. Deep learning based emotion recognition system using speech features and transcriptions[J]. arXiv:1906.05681, 2019.
[5] WANG J, XUE M, CULHANE R, et al. Speech emotion recognition with dual-sequence LSTM architecture[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, 2020: 6474-6478.
[6] LEE S, HAN D K, KO H. Fusion-ConvBERT: parallel convolution and BERT fusion for speech emotion recognition[J]. Sensors, 2020, 20(22): 6688.
[7] YE J, WEB X C, WEI Y. Temporal modeling matters: a novel temporal emotional modeling approach for speech emotion recognition[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, 2023: 1-5.
[8] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. Computer Science, 2013: 1301-3781.
[9] PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1532-1543.
[10] DEVLIN J, CHANG M W, LEE K. BERT: pre-training of deep bidirectional Transformers for language understanding[J]. arXiv:1810.04805, 2018.
[11] DAI Z, LAI G, YANG Y, et al. Funnel-Transformer: filtering out sequential redundancy for efficient language processing[C]//Proceedings of the 34th Conference on Neural Information Processing Systems, Vancouver, Canada, 2020.
[12] WANG Y, HUANG G, LI M, et al. Automatically constructing a fine-grained sentiment lexicon for sentiment analysis[J]. Cognitive Computation, 2022, 15(1): 254-271.
[13] JASSIM M A, ABD D H, OMRI M N. A survey of sentiment analysis from film critics based on machine learning, lexicon and hybridization[J]. Neural Computing and Applications, 2023, 35(13): 9437-9461.
[14] XU D, TIAN Z, LAI R, et al. Deep learning based emotion analysis of microblog texts[J]. Information Fusion, 2020, 64: 1-11.
[15] CHEN J, SUN C, ZHANG S, et al. Cross-modal dynamic sentiment annotation for speech sentiment analysis[J]. Computers & Electrical Engineering, 2023, 106: 108598.
[16] GU Y, CHEN S, MARSIC I. Deep multimodal learning for emotion recognition in spoken language[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing, 2018: 5079-5083.
[17] ATMAJA B T, SASOU A, AKAGI M. Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion[J]. Speech Communication, 2022, 140: 11-28.
[18] PORIA S, CAMBRIA E, HAZARIKA D, et al. Context- dependent sentiment analysis in user-generated videos[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 873-883.
[19] HAZARIKA D, PORIA S, MIHALCEA R, et al. ICON: interactive conversational memory network for multimodal emotion detection[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 2594-2604.
[20] MAJUMDER N, PORIA S, HAZARIKA D, et al. DialogueRNN: an attentive RNN for emotion detection in conversations[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 6818-6825.
[21] GHOSAL D, MAJUMDER N, PORIA S, et al. DialogueGCN: a graph convolutional neural network for emotion recognition in conversation[J]. arXiv:1908.11540, 2019.
[22] WANG T, HOU Y, ZHOU D. A contextual attention network for multimodal emotion recognition in conversation[C]//Proceedings of the 2021 International Joint Conference on Neural Networks, Shenzhen, China, 2021: 1-7.
[23] JOSHI A, BHAT A, JAIN A, et al. COGMEN: contextuali-zed GNN based multimodal emotion recognition[J]. arXiv:2205.02455, 2022.
[24] LAN Z, CHEN M, GOODMAN S, et al. ALBERT: a lite BERT for self-supervised learning of language representations[J]. arXiv:1909.11942, 2019.
[25] LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[J]. arXiv:1907.11692, 2019. |