[1] YUAN Z Q, LI W, XU H, et al. Transformer-based feature reconstruction network for robust multimodal sentiment analysis[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 4400-4407.
[2] SUN H, CHEN Y W, LIN L F. TensorFormer: a tensor-based multimodal transformer for multimodal sentiment analysis and depression detection[J]. IEEE Transactions on Affective Computing, 2023, 14(4): 2776-2786.
[3] HOU M, TANG J J, ZHANG J H, et al. Deep multimodal multilinear fusion with high-order polynomial pooling[C]//Advances in Neural Information Processing Systems, 2019: 12156-12166.
[4] WU Y, LIN Z J, ZHAO Y Y, et al. A text-centered shared-private framework via cross-modal prediction for multimodal sentiment analysis[C]//Findings of the Association for Computational Linguistics. Stroudsburg: ACL, 2021: 4730-4738.
[5] HU G M, LIN T E, ZHAO Y, et al. UniMSE: towards unified multimodal sentiment analysis and emotion recognition[J]. arXiv:2211.11256, 2022.
[6] ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv:1707.07250, 2017.
[7] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[J]. arXiv:1806.00064, 2018.
[8] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
[9] TSAI Y H, LIANG P P, ZADEH A, et al. Learning factorized multimodal representations[J]. arXiv:1806.06176, 2018.
[10] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1122-1131.
[11] RAHMAN W, HASAN M K, LEE S W, et al. Integrating multimodal information in large pretrained transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 2359-2369.
[12] YU W M, XU H, YUAN Z Q, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 10790-10797.
[13] TSAI Y H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 6558-6569.
[14] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018.
[15] DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP: a collaborative voice analysis repository for speech technologies[C]//Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2014: 960-964.
[16] CHEONG J H, JOLLY E, XIE T K, et al. Py-feat: Python facial expression analysis toolbox[J]. Affective Science, 2023, 4(4): 781-796.
[17] LIN H, ZHANG P L, LING J D, et al. PS-mixer: a polar-vector and strength-vector mixer model for multimodal sentiment analysis[J]. Information Processing & Management, 2023, 60(2): 103229.
[18] LEE J, KIM S, KIM S, et al. Context-aware emotion recognition networks[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2020: 10142-10151.
[19] GENG X Y, LIU H, LEE L S, et al. Multimodal masked autoencoders learn transferable representations[J]. arXiv:2205.14204, 2022.
[20] MCLACHLAN G J. Mahalanobis distance[J]. Resonance, 1999, 4(6): 20-26.
[21] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv:1606.06259, 2016.
[22] BAGHER ZADEH A, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 2236-2246.
[23] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[J]. arXiv:2109.00412, 2021.
[24] WANG D, LIU S, WANG Q, et al. Cross-modal enhancement network for multimodal sentiment analysis[J]. IEEE Transactions on Multimedia, 2023, 25: 4909-4921.
[25] WANG D, GUO X T, TIAN Y M, et al. TETFN: a text enh-anced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recognition, 2023, 136: 109259.
[26] LIU W L, XU H, HUA Y, et al. AdaFN-AG: enhancing multimodal interaction with adaptive feature normalization for multimodal sentiment analysis[J]. Intelligent Systems with Applications, 2024, 23: 200410. |