[1] PEÑA D, AGUILERA A, DONGO I, et al. A framework to evaluate fusion methods for multimodal emotion recognition[J]. IEEE Access, 2023, 11: 10218-10237.
[2] ZHAO S, JIA G, YANG J, et al. Emotion recognition from multiple modalities: fundamentals and methodologies[J]. IEEE Signal Processing Magazine, 2021, 38(6): 59-73.
[3] GANDHI A, ADHVARYU K, PORIA S, et al. Multimodal sentiment analysis: a systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions[J]. Information Fusion, 2023, 91: 424-444.
[4] HUANG C, ZHANG J, WU X, et al. TeFNA: text-centered fusion network with crossmodal attention for multimodal sentiment analysis[J]. Knowledge-Based Systems, 2023, 269: 110502.
[5] HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the 2021 International Conference on Multimodal Interaction, 2021: 6-15.
[6] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv:1707.07250, 2017.
[7] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[J]. arXiv:1806.00064, 2018.
[8] BALTRU?AITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(2): 423-443.
[9] JANGRA A, MUKHERJEE S, JATOWT A, et al. A survey on multi-modal summarization[J]. ACM Computing Surveys, 2023, 55(13S): 1-36.
[10] NGIAM J, KHOSLA A, KIM M, et al. Multimodal deep learning[C]//Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011: 689-696.
[11] NGUYEN D, NGUYEN K, SRIDHARAN S, et al. Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition[J]. Computer Vision and Image Understanding, 2018, 174: 33-42.
[12] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia, 2020: 1122-1131.
[13] YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 10790-10797.
[14] YU J, JIANG J, XIA R. Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 28: 429-439.
[15] XIAO G, TU G, ZHENG L, et al. Multimodality sentiment analysis in social Internet of things based on hierarchical attentions and CSAT-TCN with MBM network[J]. IEEE Internet of Things Journal, 2020, 8(16): 12748-12757.
[16] ZHAO J, YANG F. Fusion with GCN and SE-ResNeXt network for aspect based multimodal sentiment analysis[C]//Proceedings of the 2023 IEEE 6th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), 2023: 336-340.
[17] OU Y, CHEN Z, WU F. Multimodal local-global attention network for affective video content analysis[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(5): 1901-1914.
[18] ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
[19] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of? The Association for Computational Linguistics, 2019.
[20] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the International Conference on Machine Learning, 2021: 8748-8763.
[21] ZHU R, HAN C, QIAN Y, et al. Exchanging-based multimodal fusion with Transformer[J]. arXiv:2309.02190, 2023.
[22] JI Y, WANG J, GONG Y, et al. MAP: multimodal uncertainty-aware vision-language pre-training model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 23262-23271.
[23] BANNUR S, HYLAND S, LIU Q, et al. Learning to exploit temporal structure for biomedical vision-language processing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 15016-15027.
[24] WU W, WANG X, LUO H, et al. Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 6620-6630.
[25] ZADEH A, ZELLERS R, PINCUS E, et al. Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages[J]. IEEE Intelligent Systems, 2016, 31(6): 82-88.
[26] SAHAY S, OKUR E, KUMAR S H, et al. Low rank fusion based transformers for multimodal sequences[J]. arXiv:2007.02038, 2020.
[27] XU M, LIANG F, SU X, et al. CMJRT: cross-modal joint representation Transformer for multimodal sentiment analysis[J]. IEEE Access, 2022, 10: 131671-131679.
[28] FU Z, LIU F, WANG H, et al. LMR-CBT: learning modality-fused representations with CB-transformer for multimodal emotion recognition from unaligned multimodal sequences[J]. arXiv:2112.01697, 2021.
[29] LV F, CHEN X, HUANG Y, et al. Progressive modality reinforcement for human multimodal emotion recognition from unaligned multimodal sequences[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 2554-2562.
[30] WANG Y, LI Y, BELL P, et al. Cross-attention is not enough: incongruity-aware multimodal sentiment analysis and emotion recognition[J]. arXiv:2305.13583, 2023. |