[1] WÖLLMER M, METALLINOU A, EYBEN F, et al. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling[C]//Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), Makuhari, Chiba, Japan, September 26-30, 2010.
[2] DATCU D, ROTHKRANTZ L J. Semantic audiovisual data fusion for automatic emotion recognition[J]. Emotion Recognition: A Pattern Analysis Approach, 2015: 411-435.
[3] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv:1707.07250, 2017.
[4] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[J]. arXiv:1806.00064, 2018.
[5] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, Jul 28-Aug 2, 2019: 6558-6569.
[6] DELBROUCK J B, TITS N, BROUSMICHE M, et al. A Transformer-based joint-encoding for emotion recognition and sentiment analysis[J]. arXiv:2006.15955, 2020.
[7] SAHAY S, OKUR E, H KUMAR S, et al. Low rank fusion based Transformers for multimodal sequenes[J]. arXiv:2007. 02038, 2020.
[8] PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017.
[9] MAJUMDER N, PORIA S, HAZARIKA D, et al. Dialogue-RNN: an attentive rnn for emotion detection in conversations[C]//Proceedings of the AAAI Conference on Artificial Intelligence, Hawaii, Jan 27-Feb 1, 2019: 6818-6825.
[10] KUMAR A, VEPA J. Gated mechanism for attention based multi modal sentiment analysis[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
[11] HAZARIKA D, PORIA S, ZADEH A, et al. Conversational memory network for emotion recognition in dyadic dialogue videos[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2018.
[12] HAZARIKA D, PORIA S, MIHALCEA R, et al. ICON: interactive conversational memory network for multimodal emotion detection[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018.
[13] FU Y, OKADA S, WANG L, et al. CONSK-GCN: con-versational semantic- and knowledge-oriented graph convolutional network for multimodal emotion recog-nition[C]//Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), 2021.
[14] LIAN Z, TAO J, LIU B, et al. Conversational emotion recognition using self-attention mechanisms and graph neural networks[C]//Proceedings of the INTERSPEECH, Shanghai, October 25-29, 2020: 2347-2351.
[15] SHENOY A, SARDANA A. Multilogue-Net: a context-aware RNN for multi-modal emotion detection and sentiment analysis in conversation[C]//Proceedings of the Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML), 2020.
[16] WANG T, HOU Y, ZHOU D, et al. A contextual attention network for multimodal emotion recognition in conversation[C]//Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, Jul 18-22, 2021: 1-7.
[17] JOSHI A, BHAT A, JAIN A, et al. COGMEN: contextu-alized GNN based multimodal emotion recognition[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022.
[18] YUN S, JEONG M, KIM R, et al. Graph transformer networks[C]//Advances in Neural Information Processing Systems, 2019.
[19] SHI Y, HUANG Z, FENG S, et al. Masked label prediction: unified message passing model for semi-supervised classification[J]. arXiv:2009.03509, 2020.
[20] MELIS G, KO?ISKY T, BLUNSOM P. Mogrifier LSTM [C]//Proceedings of the International Conference on Learning Representations, 2020 .
[21] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[22] BUSSO C, BULUT M, LEE C C, et al. IEMOCAP: inter-active emotional dyadic motion capture database[J]. Language Resources and Evaluation, 2008, 42: 335-359.
[23] ZADEH A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018.
[24] EYBEN F, W?LLMER M, SCHULLER B. Opensmile: the munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM International Conference on Multimedia, New York, 2010: 1459-1462.
[25] BALTRUSAITIS T, ZADEH A, LIM Y C, et al. Openface 2.0: facial behavior analysis toolkit[C]//Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, May 15-19, 2018: 59-66.
[26] REIMERS N, GUREVYCH I. Sentence-BERT: sentence embeddings using siamese bert-networks[J]. arXiv:1908. 10084, 2019.
[27] ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication compre-hension[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018. |