Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (2): 1-18.DOI: 10.3778/j.issn.1002-8331.2305-0439
• Research Hotspots and Reviews • Previous Articles Next Articles
GUO Xu, Mairidan Wushouer, Gulanbaier Tuerhong
Online:
2024-01-15
Published:
2024-01-15
郭续,买日旦·吾守尔,古兰拜尔·吐尔洪
GUO Xu, Mairidan Wushouer, Gulanbaier Tuerhong. Survey of Sentiment Analysis Algorithms Based on Multimodal Fusion[J]. Computer Engineering and Applications, 2024, 60(2): 1-18.
郭续, 买日旦·吾守尔, 古兰拜尔·吐尔洪. 基于多模态融合的情感分析算法研究综述[J]. 计算机工程与应用, 2024, 60(2): 1-18.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2305-0439
[1] ASUR S, HUBERMAN B A. Predicting the future with social media[C]//Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2010: 492-499. [2] BOLLEN J, MAO H, ZENG X. Twitter mood predicts the stock market[J]. Journal of Computational Science, 2011, 2(1): 1-8. [3] TUMASJAN A, SPRENGER T, SANDNER P, et al. Predicting elections with twitter: What 140 characters reveal about political sentiment[C]//Proceedings of the International AAAI Conference on Web and Social Media, 2010: 178-185. [4] D'MELLO S K, KORY J. A review and meta-analysis of multimodal affect detection systems[J]. ACM Computing Surveys (CSUR), 2015, 47(3): 1-36. [5] PORIA S, CAMBRIA E, BAJPAI R, et al. A review of affective computing: from unimodal analysis to multimodal fusion[J]. Information Fusion, 2017, 37: 98-125. [6] SOLEYMANI M, GARCIA D, JOU B, et al. A survey of multimodal sentiment analysis[J]. Image and Vision Computing, 2017, 65: 3-14. [7] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv:1606.06259, 2016. [8] ZADEH A A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU—mosei dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018: 2236-2246. [9] HUDDAR M G, SANNAKKI S S, RAJPUROHIT V S. A survey of computational approaches and challenges in multimodal sentiment analysis[J]. Int J Comput Sci Eng, 2019, 7(1): 876-883. [10] GKOUMAS D, LI Q, LIOMA C, et al. What makes the difference? an empirical comparison of fusion strategies for multimodal language analysis[J]. Information Fusion, 2021, 66: 184-197. [11] CHANDRASEKARAN G, NGUYEN T N, HEMANTH D J. Multimodal sentimental analysis for social media applications: a comprehensive review[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2021, 11(5): e1415. [12] ZADEH A, CAO Y S, HESSNER S, et al. CMU-MOSEAS: a multimodal language dataset for Spanish, Portuguese, German and French[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020: 1801. [13] W?LLMER M, WENINGER F, KNAUP T, et al. Youtube movie reviews: sentiment analysis in an audio-visual context[J]. IEEE Intelligent Systems, 2013, 28(3): 46-53. [14] YU W, XU H, MENG F, et al. CH-SIMs: a chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3718-3727. [15] PéREZ-ROSAS V, MIHALCEA R, MORENCY L P. Utterance-level multimodal sentiment analysis[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2013: 973-982. [16] BUSSO C, BULUT M, LEE C C, et al. IEMOCAP: interactive emotional dyadic motion capture database[J]. Language Resources and Evaluation, 2008, 42: 335-359. [17] MORENCY L P, MIHALCEA R, DOSHI P. Towards multimodal sentiment analysis: Harvesting opinions from the web[C]//Proceedings of the 13th International Conference on Multimodal Interfaces, 2011: 169-176. [18] PORIA S, CAMBRIA E, GELBUKH A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 2539-2544. [19] WANG H, MEGHAWAT A, MORENCY L P, et al. Select-additive learning: improving generalization in multimodal sentiment analysis[C]//Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), 2017: 949-954. [20] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv:1707.07250, 2017. [21] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[J]. arXiv:1806.00064, 2018. [22] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the Conference, Association for Computational Linguistics, 2019. [23] PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017: 873-883. [24] GHOSAL D, AKHTAR M S, CHAUHAN D, et al. Contextual inter-modal attention for multi-modal sentiment analysis[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 3454-3466. [25] MAJUMDER N, HAZARIKA D, GELBUKH A, et al. Multimodal sentiment analysis using hierarchical fusion with context modeling[J]. Knowledge-Based Systems, 2018, 161: 124-133. [26] KUMAR A, VEPA J. Gated mechanism for attention based multi modal sentiment analysis[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 4477-4481. [27] ZHANG Q, SHI L, LIU P, et al. ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis[J]. Applied Intelligence, 2022, 53(12): 16332-16345. [28] SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523. [29] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv:1301.3781, 2013. [30] PENNINGTON J, SOCHER R, MANNING C D. GloVe: global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1532-1543. [31] PORIA S, CAMBRIA E, HOWARD N, et al. Fusing audio, visual and textual clues for sentiment analysis from multimodal content[J]. Neurocomputing, 2016, 174: 50-59. [32] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018. [33] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017. [34] MUNIKAR M, SHAKYA S, SHRESTHA A. Fine-grained sentiment classification using BERT[C]//2019 Artificial Intelligence for Transforming Business and Society (AITB), 2019: 1-5. [35] ARACI D. Finbert: financial sentiment analysis with pre-trained language models[J]. arXiv:1908.10063, 2019. [36] GRAVES A, FERNáNDEZ S, SCHMIDHUBER J. Bidirectional LSTM networks for improved phoneme classification and recognition[C]//Proceedings of the 15th International Conference on Artificial Neural Networks: Formal Models and Their Applications, Warsaw, Poland, September 11-15, 2005: 799-804. [37] EYBEN F, W?LLMER M, GRAVES A, et al. On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues[J]. Journal on Multimodal User Interfaces, 2010, 3: 7-19. [38] EYBEN F, W?LLMER M, SCHULLER B. OpenEAR-introducing the Munich open-source emotion and affect recognition toolkit[C]//Proceedings of the 3rd International Conference on Affective Computing And Intelligent Interaction and Workshops, 2009: 1-6. [39] EYBEN F, W?LLMER M, SCHULLER B. Opensmile: the munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM International Conference on Multimedia, 2010: 1459-1462. [40] MCFEE B, RAFFEL C, LIANG D, et al. librosa: audio and music signal analysis in python[C]//Proceedings of the 14th Python in Science Conference, 2015: 18-25. [41] DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP—a collaborative voice analysis repository for speech technologies[C]//Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, 2014: 960-964. [42] TAJADURA-JIMéNEZ A, V?STFJ?LL D. Auditory-induced emotion: A neglected channel for communication in human-computer interaction[J]. Affect and Emotion in Human-Computer Interaction: from Theory to Applications, 2008, 4868: 63-74. [43] VOGT T, ANDRé E, WAGNER J. Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation[J]. Affect and Emotion in Human-Computer Interaction: from Theory to Applications, 2008, 4868: 75-91. [44] EL AYADI M, KAMEL M S, KARRAY F. Survey on speech emotion recognition: features, classification schemes, and databases[J]. Pattern Recognition, 2011, 44(3): 572-587. [45] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. [46] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 4489-4497. [47] LITTLEWORT G, WHITEHILL J, WU T, et al. The computer expression recognition toolbox (CERT)[C]//Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), 2011: 298-305. [48] BALTRUSAITIS T, ZADEH A, LIM Y C, et al. Openface 2.0: facial behavior analysis toolkit[C]//Proceedings of the13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018: 59-66. [49] 孙影影, 贾振堂, 朱昊宇. 多模态深度学习综述[J]. 计算机工程与应用, 2020, 56(21): 1-10. SUN Y Y , JIA Z T , ZHU H Y. Survey of multimodal deep learning[J]. Computer Engineering and Applications, 2020, 56(21): 1-10. [50] PARK S, SHIM H S, CHATTERJEE M, et al. Multimodal analysis and prediction of persuasiveness in online social multimedia[J]. ACM Transactions on Interactive Intelligent Systems (TiiS), 2016, 6(3): 1-25. [51] PORIA S, CHATURVEDI I, CAMBRIA E, et al. Convolutional MKL based multimodal emotion recognition and sentiment analysis[C]//Proceedings of IEEE 16th International Conference on Data Mining, 2016: 439-448. [52] NOJAVANASGHARI B, GOPINATH D, KOUSHIK J, et al. Deep multimodal fusion for persuasiveness prediction[C]//Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016: 284-288. [53] YU Y, LIN H, MENG J, et al. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks[J]. Algorithms, 2016, 9(2): 41. [54] HUSSAIN M S, CALVO R A, AGHAEI POUR P. Hybrid fusion approach for detecting affects from multichannel physiology[C]//Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction, Memphis, TN, USA, October 9-12, 2011: 568-577. [55] WANG H, MEGHAWAT A, MORENCY L P, et al. Select-additive learning: Improving cross-individual generalization in multimodal sentiment analysis[J]. arXiv:1609.05244, 2016. [56] KOSSAIFI J, LIPTON Z C, KOLBEINSSON A, et al. Tensor regression networks[J]. The Journal of Machine Learning Research, 2020, 21(1): 4862-4882. [57] YANG X, YUMER E, ASENTE P, et al. Learning to extract semantic structure from documents using multimodal fully convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5315-5324. [58] LIANG P P, LIU Z, TSAI Y H H, et al. Learning representations from imperfect time series data via tensor rank regularization[J]. arXiv:1907.01011, 2019. [59] 胡新荣, 陈志恒, 刘军平, 等. 基于多模态表示学习的情感分析框架[J]. 计算机科学, 2022, 49(S2): 631-636. HU X R, CHEN Z H, LIU J P, et al. Sentiment analysis framework based on multimodal representation learning[J]. Computer Science, 2022, 49(S2): 631-636. [60] YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 10790-10797. [61] HU G, LIN T E, ZHAO Y, et al. UniMSE: towards unified multimodal sentiment analysis and emotion recognition[J]. arXiv:2211.11256, 2022. [62] YANG K, XU H, GAO K. CM-BERT: cross-modal bert for text-audio sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia, 2020: 521-528. [63] YU T, GAO H, LIN T E, et al. Speech-text dialog pre-training for spoken dialog understanding with explicit cross-modal alignment[J]. arXiv:2305.11579, 2023. [64] BAREZI E J, FUNG P. Modality-based factorization for multimodal fusion[J]. arXiv:1811.12624, 2018. [65] TUCKER L R. Some mathematical notes on three-mode factor analysis[J]. Psychometrika, 1966, 31(3): 279-311. [66] HITCHCOCK F L. The expression of a tensor or a polyadic as a sum of products[J]. Journal of Mathematics and Physics, 1927, 6(1/4): 164-189. [67] JIANG D, ZOU D, DENG Z, et al. Contextual multimodal sentiment analysis with information enhancement[J]. Journal of Physics: Conference Series, 2020, 1453(1): 012159. [68] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the International Conference on Machine Learning, 2017: 1126-1135. [69] NICHOL A, ACHIAM J, SCHULMAN J. On first-order meta-learning algorithms[J]. arXiv:1803.02999, 2018. [70] SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[C]//Advances in Neural Information Processing Systems, 2017. [71] SUNG F, YANG Y, ZHANG L, et al. Learning to compare: Relation network for few-shot learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1199-1208. [72] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]//Advances in Neural Information Processing Systems, 2016. [73] ZHANG C, CAI Y, LIN G, et al. DeepEMD: differentiable earth mover’s distance for few-shot learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(5): 5632-5648. [74] LIU Y, LEE J, PARK M, et al. Learning to propagate labels: Transductive propagation network for few-shot learning[J]. arXiv:1805.10002, 2018. [75] YANG L, LI L, ZHANG Z, et al. DPGN: distribution propagation graph network for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 13390-13399. [76] LEE K, MAJI S, RAVICHANDRAN A, et al. Meta-learning with differentiable convex optimization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 10657-10665. [77] RUSU A A, RAO D, SYGNOWSKI J, et al. Meta-learning with latent embedding optimization[J]. arXiv:1807.05960, 2018. [78] DAI W, LIU Z, YU T, et al. Modality-transferable emotion embeddings for low-resource multimodal emotion recognition[J]. arXiv:2009.09629, 2020. [79] YANG X, FENG S, WANG D, et al. Few-shot multimodal sentiment analysis based on multimodal probabilistic fusion prompts[J]. arXiv:2211.06607, 2022. [80] GAO T, FISCH A, CHEN D. Making pre-trained language models better few-shot learners[J]. arXiv:2012.15723, 2020. [81] RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. The Journal of Machine Learning Research, 2020, 21(1): 5485-5551. [82] MOKADY R, HERTZ A, BERMANO A H. ClipCap: clip prefix for image captioning[J]. arXiv:2111.09734, 2021. [83] BROCK A, DE S, SMITH S L. Characterizing signal propagation to close the performance gap in unnormalized resnets[J]. arXiv:2101.08692, 2021. [84] WANKHADE M, RAO A C S, KULKARNI C. A survey on sentiment analysis methods, applications, and challenges[J]. Artificial Intelligence Review, 2022, 55(7): 5731-5780. [85] MADHU S. An approach to analyze suicidal tendency in blogs and tweets using sentiment analysis[J]. Int J Sci Res Comput Sci Eng, 2018, 6(4): 34-36. [86] APALA K R, JOSE M, MOTNAM S, et al. Prediction of movies box office performance using social media[C]//Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis & Mining, 2013. [87] DANG C N, MORENO-GARCíA M N, PRIETA F D. An approach to integrating sentiment analysis into recommender systems[J]. Sensors, 2021, 21(16): 5666. [88] ELLIS J G, JOU B, CHANG S F. Why we watch the news: a dataset for exploring sentiment in broadcast video news[C]//Proceedings of the 16th International Conference on Multimodal Interaction, 2014: 104-111. [89] MAO R, LIU Q, HE K, et al. The biases of pre-trained language models: an empirical study on prompt-based sentiment analysis and emotion detection[J]. IEEE Transactions on Affective Computing, 2023, 14(3): 1743-1753. [90] CASTRO S, HAZARIKA D, V PéREZ-ROSAS, et al. Towards multimodal sarcasm detection (an obviously perfect paper)[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL 2019), Florence, Italy, July 28- August2, 2019: 4619-4629. [91] LIU B, ZHANG L. A survey of opinion mining and sentiment analysis[M]//Mining text data. Boston, MA: Springer, 2012: 415-463. [92] PORIA S, HUSSAIN A, CAMBRIA E, et al. Combining textual clues with audio-visual information for multimodal sentiment analysis[J]. Multimodal Sentiment Analysis, 2018(1): 153-178. [93] GROSMAN J S, FURTADO P, RODRIGUES A, et al. ERAS: improving the quality control in the annotation process for natural language processing tasks[J]. Information Systems, 2020, 93: 101553. [94] ZHANG D, LI S, ZHU Q, et al. Effective sentiment-relevant word selection for multi-modal sentiment analysis in spoken language[C]//Proceedings of the 27th ACM International Conference on Multimedia, 2019: 148-156. [95] HAN S, MAO R, CAMBRIA E. Hierarchical attention network for explainable depression detection on twitter aided by metaphor concept mappings[J]. arXiv:2209.07494, 2022. [96] BIRJALI M, KASRI M, BENI-HSSANE A. A comprehensive survey on sentiment analysis: approaches, challenges and trends[J]. Knowledge-Based Systems, 2021, 226: 107134. |
[1] | CHENG Zichen, LI Yan, GE Jiangwei, JIU Mengfei, ZHANG Jingwei. Multimodal Sentiment Analysis Based on Information Bottleneck [J]. Computer Engineering and Applications, 2024, 60(2): 137-146. |
[2] | HOU Yujie, LIANG Chengji. Optimization of Container Multimodal Transport Network Based on Underground Logistics System [J]. Computer Engineering and Applications, 2024, 60(2): 314-325. |
[3] | LIU Hualing, CHEN Shanghui, QIAO Liang, LIU Yaxin. Multimodal False News Detection Based on Fusion Attention Mechanism [J]. Computer Engineering and Applications, 2023, 59(9): 95-103. |
[4] | CAI Zhengyi, ZHAO Jieyu, ZHU Feng. Single-Stage Object Detection with Fusion of Point Cloud and Image Feature [J]. Computer Engineering and Applications, 2023, 59(9): 140-149. |
[5] | LI Zhuorong, TANG Yunqi. Multimodal Biometric Fusion Model Based on Deep Learning [J]. Computer Engineering and Applications, 2023, 59(7): 180-189. |
[6] | LI Jianxin, SI Guannan, TIAN Pengxin, AN Zhaoliang, ZHOU Fengyu. Survey of 3D Scene Recognition and Representation Methods of Multimodal Knowledge [J]. Computer Engineering and Applications, 2023, 59(20): 35-50. |
[7] | PAN Mengzhu, LI Qianmu, QIU Tian. Survey of Research on Deep Multimodal Representation Learning [J]. Computer Engineering and Applications, 2023, 59(2): 48-64. |
[8] | JING Li, YAO Ke. Research on Text Classification Based on Knowledge Graph and Multimodal [J]. Computer Engineering and Applications, 2023, 59(2): 102-109. |
[9] | WEI Yuqi, LI Ning. Cross-Modal Information Interaction Reasoning Network for Image and Text Retrieval [J]. Computer Engineering and Applications, 2023, 59(16): 115-124. |
[10] | NIE Xiongfeng, WANG Junying, DONG Fangmin, ZANG Zhaoxiang, JIANG Shu. Multimodal Animation Style Transfer Method Fused with Attention Mechanism [J]. Computer Engineering and Applications, 2023, 59(15): 223-234. |
[11] | ZENG Xiangjiu, LIU Dawei, LIU Yifan, ZHAO Zhibin, LIU Xiumei, REN Yougui. News Short Video Classification Model Fusing Multimodal Feature [J]. Computer Engineering and Applications, 2023, 59(14): 107-113. |
[12] | MIAO Yuqing, DONG Han, ZHANG Wanzhen, ZHOU Ming, CAI Guoyong, DU Huawei. Cross-Modal Video Emotion Analysis Method Based on Multi-Task Learning [J]. Computer Engineering and Applications, 2023, 59(12): 141-147. |
[13] | HUANG Jian, WANG Ying. Image-Text Fusion Sentiment Analysis Method Based on Image Semantic Translation [J]. Computer Engineering and Applications, 2023, 59(11): 180-187. |
[14] | XIANG Deping, ZHANG Pu, XIANG Shiming, PAN Chunhong. Multi-Modal Meteorological Forecasting Based on Transformer [J]. Computer Engineering and Applications, 2023, 59(10): 94-103. |
[15] | CHENG Zichen, LI Yan, GE Jiangwei, JIU Mengfei, ZHANG Jingwei. Cross-Modal Modulating for Multimodal Sentiment Analysis [J]. Computer Engineering and Applications, 2023, 59(10): 171-179. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||