[1] BALTRUSAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423-443.
[2] ZHANG Q A, SHI L, LIU P Y, et al. ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis[J]. Applied Intelligence, 2023, 53(12): 16332-16345.
[3] WANG D, GUO X T, TIAN Y M, et al. TETFN: a text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recognition, 2023, 136: 109259.
[4] SHAYAA S, JAAFAR N I, BAHRI S, et al. Sentiment analysis of big data: methods, applications, and open challenges[J]. IEEE Access, 2018, 6: 37807-37827.
[5] YING C C, WU Z, DAI X Y, et al. Opinion transmission network for jointly improving aspect-oriented opinion words extraction and sentiment classification[C]//Proceedings of the 9th CCF International Conference on Natural Language Processing and Chinese Computing. Cham: Springer, 2020: 629-640.
[6] LI R F, CHEN H, FENG F X, et al. Dual graph convolutional networks for aspect-based sentiment analysis[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2021: 6319-6329.
[7] D’MELLO S K, KORY J. A review and meta-analysis of multimodal affect detection systems[J]. ACM Computing Surveys, 2015, 47(3): 1-36.
[8] DAS R, SINGH T D. Multimodal sentiment analysis: a survey of methods, trends, and challenges[J]. ACM Computing Surveys, 2023, 55(13S): 1-38.
[9] ZHANG Y Z, SONG D W, LI X, et al. A quantum-like multimodal network framework for modeling interaction dynamics in multiparty conversational sentiment analysis[J]. Information Fusion, 2020, 62: 14-31.
[10] YOU Q Z, LUO J B, JIN H L, et al. Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia[C]//Proceedings of the 9th ACM International Conference on Web Search and Data Mining. New York: ACM, 2016: 13-22.
[11] FEDUS W, ZOPH B, SHAZEER N. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity[J]. Journal of Machine Learning Research, 2022, 23(1): 5232-5270.
[12] ZHANG Q, FU J L, LIU X Y, et al. Adaptive co-attention network for named entity recognition in tweets[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 5674-5681.
[13] LI Y, DING H, LIN Y M, et al. Multi-level textual-visual alignment and fusion network for multimodal aspect-based sentiment analysis[J]. Artificial Intelligence Review, 2024, 57(4): 78.
[14] YANG L, NA J C, YU J F. Cross-modal multitask transformer for end-to-end multimodal aspect-based sentiment analysis[J]. Information Processing & Management, 2022, 59(5): 103038.
[15] LING Y, YU J F, XIA R. Vision-language pre-training for multimodal aspect-based sentiment analysis[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 2149-2159.
[16] JACOBS R A, JORDAN M I, NOWLAN S J, et al. Adaptive mixtures of local experts[J]. Neural Computation, 1991, 3(1): 79-87.
[17] JORDAN M I, JACOBS R A. Hierarchical mixtures of experts and the EM algorithm[J]. Neural Computation, 1994, 6(2): 181-214.
[18] SHAZEER N, MIRHOSEINI A, MAZIARZ K, et al. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer[C]//Proceedings of the 5th International Conference on Learning Representations, 2017.
[19] SHAZEER N, CHENG Y L, PARMAR N, et al. Mesh-TensorFlow: deep learning for supercomputers[C]//Advances in Neural Information Processing Systems 31, 2018: 10435-10444.
[20] LEPIKHIN D, LEE H J, XU Y, et al. GShard: scaling giant models with conditional computation and automatic sharding[C]//Proceedings of the 9th International Conference on Learning Representations, 2021.
[21] CELIK O, ZHOU D, LI G, et al. Specializing versatile skill libraries using local mixture of experts[C]//Proceedings of the 2021 Conference on Robot Learning, 2021: 1423-1433.
[22] CHEN Z T, SHEN Y K, DING M Y, et al. Mod-Squad: designing mixtures of experts as modular multi-task learners[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 11828-11837.
[23] LEWIS M, BHOSALE S, DETTMERS T, et al. Base layers: simplifying training of large, sparse models[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 6265-6274.
[24] MUSTAFA B, RIQUELME C, PUIGCERVER J, et al. Multimodal contrastive learning with LIMoE: the language-image mixture of experts[C]//Advances in Neural Information Processing Systems 35, 2022: 9564-9576.
[25] RIQUELME C, PUIGCERVER J, MUSTAFA B, et al. Scaling vision with sparse mixture of experts[C]//Advances in Neural Information Processing Systems 34, 2021: 8583-8595.
[26] FAN A, BHOSALE S, SCHWENK H, et al. Beyond English-centric multilingual machine translation[J]. Journal of Machine Learning Research, 2021, 22(1): 4839-4886.
[27] CAO B, SUN Y M, ZHU P F, et al. Multi-modal gated mixture of local-to-global experts for dynamic image fusion[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 23498-23507.
[28] YU J F, JIANG J, YANG L, et al. Improving multimodal named entity recognition via entity span detection with unified multimodal transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3342-3352.
[29] JU X C, ZHANG D, XIAO R, et al. Joint multi-modal aspect-sentiment analysis with auxiliary cross-modal relation detection[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 4395-4405.
[30] YU J F, JIANG J. Adapting BERT for target-oriented multimodal sentiment classification[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019: 5408-5414.
[31] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 8748-8763.
[32] HAZARIKA D, PORIA S, ZADEH A, et al. Conversational memory network for emotion recognition in dyadic dialogue videos[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2018: 2122-2132.
[33] CHEN P, SUN Z Q, BING L D, et al. Recurrent attention network on memory for aspect sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2017: 452-461.
[34] ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2017: 1103-1114.
[35] XU N, MAO W J, CHEN G D. Multi-interactive memory network for aspect based multimodal sentiment analysis[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 371-378.
[36] YU J F, JIANG J, XIA R. Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 28: 429-439. |