[1] BALTRUSAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423-443.
[2] MAI S, HU H, XING S. A unimodal representation learning and recurrent decomposition fusion structure for utterance-level multimodal embedding learning[J]. IEEE Transactions on Multimedia, 2022, 24: 2488-2501.
[3] GRAVES A. Long short-term memory[M]//Supervised sequence labelling with recurrent neural networks. Berlin, Heidelberg: Springer, 2012, 385: 37-45.
[4] MAI S, XING S, HU H. Locally confined modality fusion network with a global perspective for multimodal human affective computing[J]. IEEE Transactions on Multimedia, 2020, 22(1): 122-137.
[5] MAI S, HU H, XING S. Divide, conquer and combine: hierarchical feature fusion network with local and global perspectives for multimodal affective computing[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019: 481-492.
[6] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017: 1103-1114.
[7] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 1: (Long Papers), 2018: 2247-2256.
[8] 程子晨, 李彦, 葛江炜, 等. 利用跨模态调制的多模态情感分析[J]. 计算机工程与应用, 2023, 59(10): 171-179.
CHENG Z C, LI Y, GE J W, et al. Cross-modal modulating for multimodal sentiment analysis[J]. Computer Engineering and Applications, 2023, 59(10): 171-179.
[9] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Dominican Republic, 2021: 9180-9192.
[10] ZENG Y, MAI S, HU H. Which is making the contribution: modulating unimodal and cross-modal dynamics for multimodal sentiment analysis[C]//Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 2021: 1262-1274.
[11] MAI S, ZENG Y, ZHENG S, et al. Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2022, 14(3): 2276-2289.
[12] MAI S, HU H, XING S. Modality to modality translation: an adversarial representation learning and graph fusion network for multimodal fusion[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 164-172.
[13] YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 10790-10797.
[14] WANG Y, WU J, FURUMAI K, et al. VAE-based adversarial multimodal domain transfer for video-level sentiment analysis[J]. IEEE Access, 2022, 10: 51315-51324.
[15] TISHBY N, PEREIRA F C, BIALEK W. The information bottleneck method[J]. arXiv:physics/0004057,2000.
[16] ALEMI A A, FISCHER I, DILLON J V, et al. Deep Variational information bottleneck[J]. arXiv:2310.03311,2023.
[17] ZAIDI A, ESTELLA-AGUERRI I, SHAMAI (SHITZ) S. On the information bottleneck problems: models, connections, applications and information theoretic views[J]. Entropy, 2020, 22(2): 151.
[18] WAN Z, ZHANG C, ZHU P, et al. Multi-view information-bottleneck representation learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 10085-10092.
[19] MAI S, ZENG Y, HU H. Multimodal information bottleneck: learning minimal sufficient unimodal and multimodal representations[J]. IEEE Transactions on Multimedia, 2022: 4121-4134.
[20] BELGHAZI M I, BARATIN A, RAJESHWAR S, et al. Mutual information neural estimation[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 531-540.
[21] CHENG P, HAO W, DAI S, et al. CLUB: a contrastive log-ratio upper bound of mutual information[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 1779-1788.
[22] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and -specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, NY, USA: Association for Computing Machinery, 2020: 1122-1131.
[23] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017: 5998-6008.
[24] DEVLIN J, CHANG M W, LEE K, et al. BERT: pretraining of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019: 4171-4186.
[25] DELBROUCK J B, TITS N, DUPONT S. Modulated fusion using transformer for linguistic-acoustic emotion recognition[C]//Proceedings of the First International Workshop on Natural Language Processing Beyond Text, 2020.
[26] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019: 6558-6569.
[27] CHENG J, FOSTIROPOULOS I, BOEHM B, et al. Multimodal phased transformer for sentiment analysis[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021: 2447-2458.
[28] ZHANG Q, SHI L, LIU P, et al. ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis[J]. Applied Intelligence, 2023, 53: 16332-16345.
[29] QI Q, LIN L, ZHANG R, et al. MEDT: using multimodal encoding-decoding network as in transformer for multimodal sentiment analysis[J]. IEEE Access, 2022, 10: 28750-28759.
[30] YANG B, SHAO B, WU L, et al. Multimodal sentiment analysis with unidirectional modality translation[J]. Neurocomputing, 2022, 467: 130-137.
[31] SUN H, WANG H, LIU J, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]//Proceedings of the 30th ACM International Conference on Multimedia, 2022: 3722-3729.
[32] RAHMAN W, HASAN M K, LEE S, et al. Integrating multimodal information in large pretrained transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.
[33] YANG K, XU H, GAO K. CM-BERT: cross-modal BERT for text-audio sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, NY, USA: Association for Computing Machinery, 2020: 521-528.
[34] LUO H, JI L, HUANG Y, et al. ScaleVLAD: improving multimodal sentiment analysis via multi-scale fusion of locally descriptors[J]. arXiv:2112.01368,2021.
[35] TISHBY N, ZASLAVSKY N. Deep learning and the information bottleneck principle[C]//2015 IEEE Information Theory Workshop (ITW), 2015: 1-5.
[36] FEDERICI M, DUTTA A, FORRé P, et al. Learning robust representations via multi-view information bottleneck[J]. arXiv:2020.07017,2020.
[37] LEE C, SCHAAR M V D. A variational information bottleneck approach to multi-omics data integration[C]//Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021: 1513-1521.
[38] DONSKER M D, VARADHAN S R S. Asymptotic evaluation of certain Markov process expectations for large time—III[J]. Communications on Pure and Applied Mathematics, 1976, 29(4): 389-461.
[39] NGUYEN X, WAINWRIGHT M J, JORDAN M. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization[C]//Advances in Neural Information Processing Systems, 2007.
[40] NOWOZIN S, CSEKE B, TOMIOKA R. f-GAN: training generative neural samplers using variational divergence minimization[C]//Advances in Neural Information Processing Systems, 2016.
[41] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv:1606.06259,2016.
[42] BAGHER ZADEH A, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018: 2236-2246.
[43] YU W, XU H, MENG F, et al. CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3718-3727.
[44] DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP—a collaborative voice analysis repository for speech technologies[C]//2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014: 960-964.
[45] OORD A V D, LI Y, VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv:1807.03748,2018.
[46] POOLE B, OZAIR S, VAN DEN OORD A, et al. On variational bounds of mutual information[C]//International Conference on Machine Learning, 2019: 5171-5180. |