Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (13): 124-135.DOI: 10.3778/j.issn.1002-8331.2302-0238
• Pattern Recognition and Artificial Intelligence • Previous Articles Next Articles
WANG Liang, WANG Yi, WANG Jun
Online:
2024-07-01
Published:
2024-07-01
王亮,王屹,王军
WANG Liang, WANG Yi, WANG Jun. Cross-Modal Transformer Combination Model for Sentiment Analysis[J]. Computer Engineering and Applications, 2024, 60(13): 124-135.
王亮, 王屹, 王军. 情感分析的跨模态Transformer组合模型[J]. 计算机工程与应用, 2024, 60(13): 124-135.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2302-0238
[1] KUMAR A, VEPA J. Gated mechanism for attention based multi modal sentiment analysis[C]//Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, 2020: 4477-4481. [2] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5): 426-438. ZHANG Y Z, RONG L, SONG D W, et al. A review of multimodal sentiment analysis[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(5): 426-438. [3] MENG Y, HUANG J X, ZHANG Y, et al. Generating training data with language models: towards zero-shot language understanding[J/OL]. (2022-10-12)[2023-01-10]. https://arxiv.org/abs/2202.04538v2. [4] ZHANG F, LI X C, LIM C P, et al. Deep emotional arousal network for multimodal sentiment analysis and emotion recognition[J]. Information Fusion, 2022, 5(7): 88-91. [5] YANG L, NA J C, YU J F. Cross-modal multitask Transformer for end-to-end multimodal aspect-based sentiment analysis[J]. Information Processing and Management, 2022, 4(8): 59-64. [6] YANG M P, LI Y Y, ZHANG H. GME-Dialogue-NET: gated multi-modal sentiment analysis model based on fusion mechanism[J]. Academic Journal of Computing & Information Science, 2021, 5(3): 4-12. [7] XIAO G R, TU G, ZHENG L, et al. Multimodality sentiment analysis in social Internet of things based on hierarchical attentions and CSAT-TCN with MBM network[J]. IEEE Internet of Things Journal, 2021, 6(5): 8-24. [8] HUDDAR M G, SANNAKKI S, RAJPUROHIT V. Attention-based multi-modal sentiment analysis and emotion detection in conversation using RNN[J]. International Journal of Interactive Multimedia and Artificial Intelligence, 2021, 7(8): 6-12. [9] HUDDAR M G, SANNAKKI S, RAJPUROHIT V. Multi-level context extraction and attention-based contextual inter-modal fusion for multimodal sentiment analysis and emotion classification[J]. International Journal of Multimedia Information Retrieval, 2019, 2(3): 9-11. [10] WANG Y K, CHEN X H, CAO L L, et al. Multimodal token fusion for vision Transformers[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022: 12176-12185. [11] 杨杨, 詹德川, 姜远, 等. 可靠多模态学习综述[J]. 软件学报, 2021, 32(4): 1067-1081. YANG Y, ZHAN D C, JIANG Y, et al. A survey of reliable multimodal learning[J]. Journal of Software, 2021, 32(4): 1067-1081. [12] ARJMAND M, DOUSTI M, MORADI H. TEASEL: a Transformer-based speech-prefixed language model[J/OL]. (2021-09-12)[2022-11-13]. https://arxiv.org/abs/2109.05522v1. [13] TAN H H, BANSAL M. Lxmert: learning cross-modality encoder representations from Transformers[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019: 5100-5111. [14] YU W M, XU H, YUAN Z Q, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 10790-10797. [15] RAHMAN W, HASAN M, LEE S W, et al. Integrating multimodal information in large pretrained Transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 2359-2369. [16] GUO X D, WANG Y D, MIAO Z J, et al. ER-MRL: emotion recognition based on multimodal representation learning[C]//Proceedings of the 2022 12th International Conference on Information Science and Technology (ICIST), Kaifeng, China, 2022: 421-428. [17] BERREBBI D, SHI J T, YAN B, et al. Combining spectral and self-supervised features for low-resource speech recognition and translation[J/OL]. (2022-04-18) [2022-11-15]. https://arxiv.org/abs/2204.02470v2. [18] KIKUTSUJI T, MORI Y, OKAZAKI K, et al. Explaining reaction coordinates of alanine dipeptide isomerization obtained from deep neural networks using explainable artificial intelligence[J/OL]. (2022-04-01)[2022-11-18]. https://arxiv.org/abs/2202.07276v3. [19] BASEVSKI A, ZHOU H, MOHAMED A R, et al. Wav2vec 2.0: a framework for self-supervised learning of speech representations[J/OL]. (2020-10-22)[2022-12-13]. https://arxiv.org/abs/2006.11477. [20] AKHTAR M S, CHAUHAN D S, EKBAL A. A deep multi-task contextual attention framework for multi-modal affect analysis[J]. ACM Transactions on Knowledge Discovery from Data, 2020, 5(9): 14-17. [21] LUPPINO L T, HANSEN M A, KAMPFFMEYER M, et al. Code-aligned autoencoders for unsupervised change detection in multimodal remote sensing images[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 60-72. [22] HUANG J, LIN Z H, YANG Z G, et al. Temporal graph convolutional network for nultimodal sentiment analysis[C]// Proceedings of the 2021 International Conference on Multimodal Interaction. Association for Computing Machinery, New York, NY, USA, 2021: 239-247. [23] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia, New York, NY, USA, 2020: 1122-1131. [24] CHAUHAN D S, EKBAL A, BHATTACHARYYA P. An efficient fusion mechanism for multimodal low-resource setting[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, 2022: 2583-2588. [25] FU Z W, LIU F, XU Q, et al. NHFNET: a non-homogeneous fusion network for multimodal sentiment analysis[C]//Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, China, 2022: 1-6. [26] AL-AZANI S, EI S M, EI A. Enhanced video analytics for sentiment analysis based on fusing textual, auditory and visual information[J]. IEEE Access, 2020: 136843-136857. [27] 杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述[J]. 软件学报, 2021, 32(2): 327-348. DU P F, LI X Y, GAO Y L. A survey of multimodal visual language representation learning[J]. Journal of Software, 2021, 32(2): 327-348. [28] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2019: 4171-4186. [29] 朱张莉, 饶元, 吴渊, 等. 注意力机制在深度学习中的研究进展[J]. 中文信息学报, 2019, 33(6): 1-11. ZHU Z L, RAO Y, WU Y, et al. Research progress of attention mechanism in deep learning[J]. Journal of Chinese Information Processing, 2019, 33(6): 1-11. [30] XU H F, GENABITH J V, XIONG D Y, et al. Learning source phrase representations for neural machine translation[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 386-396. [31] VASWANI A, SHAZEER N M, PARMAR N, et al. Attention is all you need[J/OL]. (2017-12-06)[2022-12-08]. https://arxiv.org/abs/1706.03762v5. [32] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J/OL]. (2016-08-12)[2022-12-06]. https://arxiv.org/abs/1606.06259. [33] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, 2021: 9180-9192. [34] HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the 2021 International Conference on Multimodal Interaction, New York, NY, USA, 2021: 6-15. [35] WANG Z L, WAN Z H, WAN X J. TransModality: an End2End fusion method with Transformer for multimodal sentiment analysis[C]//Proceedings of The Web Conference 2020, New York, NY, USA, 2020: 2514-2520. [36] SUN H, WANG H Y, LIU J Q, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]//Proceedings of the 30th ACM International Conference on Multimedia, New York, NY, USA, 2022: 3722-3729. [37] YANG K C, XU H, GAO K. CM-BERT: cross-modal BERT for text-audio sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia, New York, NY, USA, 2020: 521-528. |
[1] | XU Zhihong, ZHANG Tianrun, WANG Liqin, DONG Yongfeng. Temporal Knowledge Graph Reasoning with Graph Reconstruction [J]. Computer Engineering and Applications, 2024, 60(9): 181-187. |
[2] | LIU Shipeng, NING Dejun, MA Jue. LSTformer Model for Photovoltaic Power Prediction [J]. Computer Engineering and Applications, 2024, 60(9): 317-325. |
[3] | PENG Kai, MA Fangling, XU Bo, GUO Jialu, HU Menglan. Active Microservice Fine-Grained Scaling Algorithm [J]. Computer Engineering and Applications, 2024, 60(8): 274-286. |
[4] | YANG Xi, GUO Junjun, YAN Haining, TAN Kaiwen, XIANG Yan, YU Zhengtao. Dynamic Dominant Fusion Multimodal Sentiment Analysis Method Based on Autoencoder [J]. Computer Engineering and Applications, 2024, 60(6): 180-187. |
[5] | CHENG Zichen, LI Yan, GE Jiangwei, JIU Mengfei, ZHANG Jingwei. Multimodal Sentiment Analysis Based on Information Bottleneck [J]. Computer Engineering and Applications, 2024, 60(2): 137-146. |
[6] | LI Pengfei, HE Yang, WU Jianhong. Spatio-Temporal Network Interest Point Recommendation Algorithm Fusing Global Features [J]. Computer Engineering and Applications, 2024, 60(11): 75-83. |
[7] | SUN Zhen, LI Xinfu. Named Entity Recognition of Chinese Electronic Medical Records Based on Multi-Feature Fusion [J]. Computer Engineering and Applications, 2023, 59(23): 136-144. |
[8] | LIU Yalin, LU Tianliang. Deepfake Video Detection Method Improved by GRU and Involution [J]. Computer Engineering and Applications, 2023, 59(22): 276-283. |
[9] | ZOU Jie, LI Lu. Stock Price Prediction Research Based on RF-SA-GRU Model [J]. Computer Engineering and Applications, 2023, 59(15): 300-309. |
[10] | WANG Qingrong, ZHOU Yutong, ZHU Changfeng, WU Yuyu. Road Network Traffic Accident Risk Prediction Based on Spatio-Temporal Graph Convolution Network [J]. Computer Engineering and Applications, 2023, 59(13): 266-272. |
[11] | HUANG Jian, WANG Ying. Image-Text Fusion Sentiment Analysis Method Based on Image Semantic Translation [J]. Computer Engineering and Applications, 2023, 59(11): 180-187. |
[12] | PENG Nan, GUO Jianfeng, ZHANG Wenxuan, WANG Jing, TAO Kai, HOU Senquan. Research on Prediction Technology of Safety Index of Power Supply System Based on Deep Learning [J]. Computer Engineering and Applications, 2023, 59(10): 314-320. |
[13] | CHENG Zichen, LI Yan, GE Jiangwei, JIU Mengfei, ZHANG Jingwei. Cross-Modal Modulating for Multimodal Sentiment Analysis [J]. Computer Engineering and Applications, 2023, 59(10): 171-179. |
[14] | CHAI Ruimin, YIN Chen. User Relationship and Context-Aware Next Point of Interest Recommendation [J]. Computer Engineering and Applications, 2022, 58(7): 197-205. |
[15] | WU Di, JIANG Liting, WANG Lulu, Tuergen Yibulayin, Aishan Wumaier, Zaokere Kadder. Research on Classification of Tourist Questions Combined with Multi-head Attention Mechanism [J]. Computer Engineering and Applications, 2022, 58(3): 165-171. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||