[1] HAZARIKA D, ZIMMERMANN R, PORIA S. Misa: modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia, Seattle WA USA, Oct 12-16, 2020. New York: ACM, 2020: 1122-1131.
[2] YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021: 10790-10797.
[3] ZHENG I, ZHANG S, WANG X, et al. Multimodal representations learning based on mutual information maximization and minimization and identity embedding for multimodal sentiment analysis[J]. arXiv:2201.03969, 2022.
[4] LI X, CHEN M. Multimodal sentiment analysis with multi-perspective fusion network focusing on sense attentive language[C]//Proceedings of the 19th Chinese National Conference on Computational Linguistics, Haikou China, Oct 30-Nov 1, 2020.[S.l.]: Chinese Information Processing Society of China, 2020: 359-373.
[5] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Sep 7-11, 2017. Stroudsburg USA, PA: Association for Computational Linguistics, 2017: 1103-1114.
[6] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne Australia, July 15-20, 2018. Cambridge, UK: Cambridge University Press, 2018: 2247-2256.
[7] PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver Canada, Jul 30-Aug 4, 2017. Cambridge, UK: Cambridge University Press, 2017: 873-883.
[8] CHEN F, LUO Z, XU Y, et al. Complementary fusion of multi-features and multi-modalities in sentiment analysis[J]. arXiv:1904.08138, 2019.
[9] MAJUMDER N, PORIA S, HAZARIKA D, et al. DialogueRNN: an attentive RNN for emotion detection in conversations[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, Hawaii, USA, Jan 27-Feb 1, 2019. Palo Alto, CA: AAAI Press, 2019: 6818-6825.
[10] CHOI W Y, SONG K Y, LEE C W. Convolutional attention networks for multimodal emotion recognition from speech and text data[C]//Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), Melbourne Australia, Jul 20, 2018. Stroudsburg USA, PA: Association for Computational Linguistics, 2018: 28-34.
[11] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence Italy, Jul 28-Aug 2, 2019. Cambridge, UK: Cambridge University Press, 2019: 6558-6569.
[12] SIRIWARDHANA S, REIS A, WEERASEKERA R, et al. Jointly fine-tuning “BERT-like” self-supervised models to improve multimodal speech emotion recognition[C]//Proceedings of the 21st Annual Conference of the International Speech Communication Association, Oct 25-29, 2020.[S.l.]: International Speech Communication Association (ISCA), 2020: 3755-3759.
[13] SIRIWARDHANA S, KALUARACHCHI T, BILLINGHURST M, et al. Multimodal emotion recognition with transformer-based self supervised feature fusion[J]. IEEE Access, 2020, 8: 176274-176285.
[14] ZHANG D, JU X, LI J, et al. Multi-modal multi-label emotion detection with modality and label dependence[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Nov 16-20, 2020. Stroudsburg USA, PA: Association for Computational Linguistics, 2020: 3584-3593.
[15] HE J, ZHANG C Q, LI X Z, et al. Survey of research on multimodal fusion technology for deep learning[J]. Computer Engineering, 2020, 46(5): 1-11.
[16] CHEN M, LI X. SWAFN: sentimental words aware fusion network for multimodal sentiment analysis[C]//Proceedings of the 28th International Conference on Computational Linguistics, Barcelona Spain (online), Dec 8-13, 2020.[S.l.]: International Committee on Computational Linguistics, 2020: 1067-1077.
[17] SUN Z, SARMA P, SETHARES W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, Hilton New York, Feb 7-12, 2020. Palo Alto, CA: AAAI Press, 2020: 8992-8999.
[18] WU Y, LIN Z, ZHAO Y, et al. A text-centered shared-private framework via cross-modal prediction for multimodal sentiment analysis[C]//Findings of the Association for Computational Linguistics (ACL-IJCNLP 2021) , Aug 2-5, 2021. Stroudsburg USA, PA: Association for Computational Linguistics, 2021: 4730-4738.
[19] MAI S, HU H, XING S. Divide, conquer and combine: hierarchical feature fusion network with local and global perspectives for multimodal affective computing[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence Italy, Jul 28-Aug 2, 2019. Cambridge, UK: Cambridge University Press, 2019: 481-492.
[20] DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg USA, PA: Association for Computational Linguistics, 2020: 4171-4186.
[21] NG A. Sparse autoencoder[J]. CS294A Lecture Notes, 2011, 72: 1-19.
[22] ZADEH A, ZELLERS R, PINCUS E, et al. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv:1606.06259, 2016.
[23] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, Feb 2-7, 2018. Palo Alto, CA: AAAI Press, 2018: 5634-5641. |