Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (3): 223-233.DOI: 10.3778/j.issn.1002-8331.2309-0470
• Pattern Recognition and Artificial Intelligence • Previous Articles Next Articles
YU Bengong, SHI Zhongyu
Online:
2025-02-01
Published:
2025-01-24
余本功,石中玉
YU Bengong, SHI Zhongyu. Deep Attention and Two-Stage Fusion of Image-Text Sentiment Contrastive Learning Method[J]. Computer Engineering and Applications, 2025, 61(3): 223-233.
余本功, 石中玉. 深层注意力和两阶段融合的图文情感对比学习方法[J]. 计算机工程与应用, 2025, 61(3): 223-233.
[1] 赵杨, 张雪, 王玮航, 等. 基于多模态情感分析的图书馆智能服务用户情感体验度量[J]. 情报科学, 2023, 41(9): 155-163. ZHAO Y, ZHANG X, WANG W H, et al. Emotional experience measurement of library intelligent service users based on multi-modal emotional analysis[J]. Information Science, 2023, 41(9): 155-163. [2] ZHOU J, JIN P, ZHAO J. Sentiment analysis of online reviews with a hierarchical attention network[C]//Proceedings of the International Conference on Software Engineering and Knowledge Engineering, 2020: 100-110. [3] ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018. [4] NOJAVANASGHARI B, GOPINATH D, KOUSHIK J, et al. Deep multimodal fusion for persuasiveness prediction[C]//Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016: 284-288. [5] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv:1707.07250, 2017. [6] GKOUMAS D, LI Q, LIOMA C, et al. What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis[J]. Information Fusion, 2021, 66: 184-197. [7] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019: 6558-6569. [8] RAHMAN W, HASAN M K, LEE S, et al. Integrating multimodal information in large pretrained transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 2359. [9] ZHANG Q, SHI L, LIU P, et al. ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis[J]. Applied Intelligence, 2022, 53(12): 16332-16345. [10] LIANG P P, LIU Z, ZADEH A, et al. Multimodal language analysis with recurrent multistage fusion[J]. arXiv:1808. 03920, 2018. [11] XIAO X, PU Y, ZHAO Z, et al. Image-text sentiment analysis via context guided adaptive fine-tuning transformer[J]. Neural Processing Letters, 2023, 55(3): 2103-2125. [12] 陈杰, 马静, 李晓峰, 等. 基于DR-Transformer模型的多模态情感识别研究[J]. 情报科学, 2022, 40(3): 117-125. CHEN J, MA J, LI X F, et al. Multi-modal emotion recognition based on DR-Transformer model[J]. Information Science, 2022, 40(3): 117-125. [13] BASU P, TIWARI S, MOHANTY J, et al. Multimodal sentiment analysis of #MeToo tweets using focal loss (grand challenge) [C]//Proceedings of the 2020 IEEE Sixth International Conference on Multimedia Big Data (Big MM), 2020. [14] HUANG F, WEI K, WENG J, et al. Attention-based modality-gated networks for image-text sentiment analysis[J]. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2020, 16(3): 1-19. [15] YANG X, FENG S, WANG D, et al. Image-text multimodal emotion classification via multi-view attentional network[J]. IEEE Transactions on Multimedia, 2020, 23: 4014-4026. [16] ZHOU B, LAPEDRIZA A, KHOSLA A, et al. Places: A 10 million image database for scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(6): 1452-1464. [17] YANG X, FENG S, ZHANG Y, et al. Multimodal sentiment detection based on multi-channel graph neural networks[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021: 328-339. [18] YANG K, XU H, GAO K. CM-BERT: cross-modal BERT for text-audio sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia, 2020: 521-528. [19] 包广斌, 李港乐, 王国雄. 面向多模态情感分析的双模态交互注意力[J]. 计算机科学与探索, 2022, 16(4) : 909-916. BAO G B, LI G L, WANG G X. Bimodal interactive attention for multimodal sentiment analysis[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 909-916. [20] LIAO W, ZENG B, LIU J, et al. Image-text interaction graph neural network for image-text sentiment analysis[J]. Applied Intelligence, 2022, 52(10): 11184-11198. [21] HUANG F, ZHANG X, ZHAO Z, et al. Image-text sentiment analysis via deep multimodal attentive fusion[J]. Knowledge-Based Systems, 2019, 167: 26-37. [22] ZHU T, LI L, YANG J, et al. Multimodal sentiment analysis with image-text interaction network[J]. IEEE Transactions on Multimedia, 2022, 35: 3375-3385. [23] SUN T, WANG S, ZHONG S. Multi-granularity feature attention fusion network for image-text sentiment analysis[C]//Proceedings of the 39th International Conference on Computer Graphics (CGI 2022). Cham: Springer Nature Switzerland, 2022: 3-14. [24] ZHAO Z, ZHU H, XUE Z, et al. An image-text consistency driven multimodal sentiment analysis approach for social media[J]. Information Processing & Management, 2019, 56(6): 102097. [25] 缪裕青, 杨爽, 刘同来, 等. 基于跨模态门控机制和改进融合方法的多模态情感分析[J]. 计算机应用研究, 2023, 40(7): 2025-2030. MIAO Y Q, YANG S, LIU T L, et al. Multimodal sentiment analysis based on cross-modal gating mechanism and improved fusion method[J]. Application Research of Computers, 2023, 40(7): 2025-2030. [26] 刘青文, 买日旦·吾守尔, 古兰拜尔·吐尔洪. 双元双模态下二次门控融合的多模态情感分析[J]. 计算机工程与应用, 2024, 60(8): 165-172. LIU Q W, MAIRIDAN·W, GULANBAIER·T. Bi-bi-modality with bi-gated fusion in multimodal sentiment analysis[J]. Computer Engineering and Applications, 2024, 60(8): 165-172. [27] MAI S, ZENG Y, ZHENG S, et al. Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2023, 14(3): 2276-2289. [28] WANG H, LI X, REN Z, et al. Multimodal sentiment analysis representations learning via contrastive learning with condense attention fusion[J]. Sensors, 2023, 23(5): 2679. [29] LI Z, XU B, ZHU C, et al. CLMLF: a contrastive learning and multi-layer fusion method for multimodal sentiment detection[J]. arXiv:2204.05515, 2022. [30] DEVLIN J, CHANG M W, LEE K, et al. BERT: pretraining of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019: 4171-4186. [31] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778. [32] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017. [33] GONG Y, BOWMAN S R. Ruminating reader: reasoning with gated multi-hop attention[J]. arXiv:1704.07415, 2017. [34] NIU Z, ZHONG G, YU H. A review on the attention mechanism of deep learning[J]. Neurocomputing, 2021, 452: 48-62. [35] NIU T, ZHU S, PANG L, et al. Sentiment analysis on multi-view social data[C]//Proceedings of the 22nd International Conference on MultiMedia Modeling (MMM 2016), Miami, FL, USA, January 4-6, 2016: 15-27. [36] CAI Y, CAI H, WAN X. Multi-modal sarcasm detection in twitter with hierarchical fusion model[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2506-2515. [37] LUO Y R, ZHI L I, DEPARTMENT E E, et al. Word sense disambiguation in biomedical text based on Bi-LSTM[J]. Software Guide, 2019. [38] HUANG L, MA D, LI S, et al. Text level graph neural network for text classification[J]. arXiv:1910.02356, 2019. [39] CAI G, XIA B. Convolutional neural networks for multimedia sentiment analysis[C]//Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing (NLPCC 2015), Nanchang, China, October 9-13, 2015: 159-167. [40] YU Y, LIN H, MENG J, et al. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks[J]. Algorithms, 2016, 9(2): 41. [41] XU N. Analyzing multimodal public sentiment based on hierarchical semantic attentional network[C]//Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), 2017: 152-154. [42] CHEEMA G S, HAKIMOV S, MüLLER-BUDACK E, et al. A fair and comprehensive comparison of multimodal tweet sentiment analysis methods[C]//Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding, 2021: 37-45. [43] 周婷, 杨长春. 基于多层注意力机制的图文双模态情感分析[J]. 计算机工程与设计, 2023, 44(6): 1853-1859. ZHOU T, YANG C C. Image-text sentiment analysis based on multilevel attention mechanism[J]. Computer Engineering and Design, 2023, 44(6): 1853-1859. [44] SCHIFANELLA R, DE JUAN P, TETREAULT J, et al. Detecting sarcasm in multimodal social platforms[C]//Proceedings of the 24th ACM International Conference on Multimedia, 2016: 1136-1145. [45] XU N, ZENG Z, MAO W. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3777-3786. |
[1] | DUAN Keke, ZHENG Junrong, YAN Ze. Unsupervised Tracking Combining Moving Object Discovery and Contrastive Learning [J]. Computer Engineering and Applications, 2025, 61(4): 141-149. |
[2] | JI Yihao, REN Yizhi, YUAN Lifeng, LIU Rongke, PAN Gaoning. Event Type Induction Combined with Contrastive Learning and Iterative Optimization [J]. Computer Engineering and Applications, 2025, 61(3): 196-211. |
[3] | LU Qiulin, WANG Huiying, ZHU Fengran, LI Quanxin, PANG Jun. Semi-Supervised Multi-Graph Classification Combining Graph Neural Network and Graph Contrastive Learning [J]. Computer Engineering and Applications, 2025, 61(1): 368-374. |
[4] | LIU Qingwen, Mairidan·Wushouer, Gulanbaier·Tuerhong. Bi-Bi-Modality with Bi-Gated Fusion in Multimodal Sentiment Analysis [J]. Computer Engineering and Applications, 2024, 60(8): 165-172. |
[5] | HU Zhiqiang, LI Pengjun, WANG Jinlong, XIONG Xiaoyun. Research on Policy Tools Classification Based on ChatGPT Augmentation and Supervised Contrastive Learning [J]. Computer Engineering and Applications, 2024, 60(7): 292-305. |
[6] | YANG You, YAO Lu. Image-Guided Augmentation Visual Question Answering Model Combined with Contrastive Learning [J]. Computer Engineering and Applications, 2024, 60(7): 157-166. |
[7] | ZHENG Yang, WU Yongming, XU An. Vectorized Feature Space Embedded Clustering Based on Contrastive Learning [J]. Computer Engineering and Applications, 2024, 60(4): 211-219. |
[8] | XU Zhihong, QIU Penglin, WANG Liqin, DONG Yongfeng. Completion of Temporal Knowledge Graph for Historical Contrastive Learning [J]. Computer Engineering and Applications, 2024, 60(22): 154-161. |
[9] | LIU Mingming, LIU Bing, LIU Hao, ZHANG Haiyan. Diverse Image Captioning via Conditional Variational Transformer and Introspective Adversarial Learning [J]. Computer Engineering and Applications, 2024, 60(21): 164-171. |
[10] | LI Baozhen, KONG Qianwen, SU Yuwei. Anomaly Detection on Attribute Network by Multi-Angle Contrastive Learning [J]. Computer Engineering and Applications, 2024, 60(19): 167-177. |
[11] | WANG Lulu, XU Zengmin, ZHANG Xuelian, MENG Ruxing, LU Tao. Cross-View Temporal Contrastive Learning for Self-Supervised Video Representation [J]. Computer Engineering and Applications, 2024, 60(18): 158-166. |
[12] | XU Yunfeng, FAN Hexun. Self-Supervised Graph Representation Learning Method Based on Data and Feature Augmentation [J]. Computer Engineering and Applications, 2024, 60(17): 148-157. |
[13] | WANG Jinghong, WANG Hui. Self-Supervised Contrastive Attributed Graph Joint Representation Clustering [J]. Computer Engineering and Applications, 2024, 60(16): 133-142. |
[14] | MIAO Borui, XU Yunfeng, ZHAO Shaojie, WANG Jialin. C-BGA: Multimodal Speech Emotion Recognition Network Combining Contrastive Learning [J]. Computer Engineering and Applications, 2024, 60(16): 168-176. |
[15] | QI Sheng, GAO Rong, SHAO Xiongkai, WU Xinyun, WAN Xiang, GAO Haiyan. Hypergraph-Based Meta-Path Explanation Contrastive Learning for Group Recommendation [J]. Computer Engineering and Applications, 2024, 60(11): 268-280. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||
Full text 56
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Abstract 53
|
|
|||||||||||||||||||||||||||||||||||||||||||||