Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (10): 171-179.DOI: 10.3778/j.issn.1002-8331.2201-0406

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Cross-Modal Modulating for Multimodal Sentiment Analysis

CHENG Zichen, LI Yan, GE Jiangwei, JIU Mengfei, ZHANG Jingwei   

  1. 1.College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin 300387, China
    2.Tianjin Key Laboratory of Wireless Mobile Communications and Power Transmission, Tianjin 300387, China
  • Online:2023-05-15 Published:2023-05-15

利用跨模态调制的多模态情感分析

程子晨,李彦,葛江炜,纠梦菲,张敬伟   

  1. 1.天津师范大学 电子与通信工程学院,天津 300387
    2.天津市无线移动通信与无线电能传输重点实验室,天津 300387

Abstract: How to effectively represent modalities and efficiently integrate information between modalities has always been a hot issue in the field of multimodal sentiment analysis(MSA). Most of the existing research is based on the Transformer, and the self-attention module is improved to achieve the effect of cross-modal fusion. However, the fusion method based on the Transformer often ignores the importance of different modalities, and the Transformer cannot effectively capture the temporal features. In response to the above problems, a cross-modal modulating and multimodal gating module network is proposed, which uses the LSTM and the BERT as the representation sub-networks of visual, acoustic and text modalities respectively. The improved Transformer cross-modal modulation module is used to effectively fuse different modal information. A modal gating network is designed to simulate the synthetic judgment process of information from different modes. Comparative experiments are carried out using MOSI and MOSEI datasets, and the results show that the proposed method can effectively improve the accuracy of sentiment classification.

Key words: multimodal sentiment analysis(MSA), Transformer model, cross-modal modulating, multimodal gating network

摘要: 如何对模态进行有效表示和对模态间信息进行高效融合,一直是多模态情感分析领域的一个热点问题。已有研究大都以Transformer为基础,对其中自注意力模块进行改进以达到跨模态融合的效果。但基于Transformer的融合方式往往忽略了不同模态之间的重要程度,同时Transformer无法有效地捕捉到时间特征。为此,提出了基于跨模态调制及模态门控网络模型。该模型利用LSTM网络和BERT分别作为视觉、听觉和文本模态的表示子网络;利用改进的Transformer模型的跨模态调制模块对不同的模态信息进行有效的融合;设计了模态门控网络,模拟人类对来自不同模态的信息进行综合的判断。利用MOSI、MOSEI数据集进行了对比实验,结果表明所提出的方法有效地提高了情感分类的准确度。

关键词: 多模态情感分析(MSA), Transformer模型, 跨模态调制, 模态门控网络