Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (7): 92-100.DOI: 10.3778/j.issn.1002-8331.2210-0288

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Multiview Interaction Learning Network for Multimodal Aspect-Level Sentiment Analysis

WANG Xuyang, PANG Wenqian, ZHAO Lijie   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
  • Online:2024-04-01 Published:2024-04-01

多模态方面级情感分析的多视图交互学习网络

王旭阳,庞文倩,赵丽婕   

  1. 兰州理工大学 计算机与通信学院,兰州 730050

Abstract: Previous multimodal aspect-level sentiment analysis methods only use the general text and picture representations of the pre-trained model, which are insensitive to recognition of aspect and opinion word correlation, and the contribution of picture information to word representation cannot be obtained dynamically, so they cannot fully recognize the correlation between multimodal and aspects. Aiming at the above problems, a multiview interaction learning network is proposed. In order to make full use of the global features of the text in multimodal interaction, extracting sentence features from context and syntax views respectively sentences are extracted. Model the relationship among text, picture and aspect to realize multimodal interaction. At the same time, the interactive representation of different modalities is fused to dynamically obtain the contribution of visual information to each word in the text, and the correlation between modalities and aspects is fully extracted. Finally, the sentiment classification results are obtained through the fully connected layer and Softmax layer. Experiments on two datasets show that this model can effectively enhance the effect of multimodal aspect-level sentiment classification.

Key words: multimodal aspect-level sentiment analysis, pre-trained model, multiview learning, multimodal interaction, dynamic fusion

摘要: 以往的多模态方面级情感分析方法只利用预训练模型的一般文本和图片表示,对方面和观点词相关性的识别不敏感,且不能动态获取图片信息对单词表示的贡献,因而不能充分识别多模态与方面之间的相关性。针对上述问题,提出一种多视图交互学习网络模型。将句子从上下文和句法两个视图上分别提取特征,以便在多模态交互时充分利用到文本的全局特征;对文本、图片和方面之间的关系进行建模,使模型实现多模态交互;同时融合不同模态的交互表示,动态获取视觉信息对文本中每个单词的贡献程度,充分提取模态与方面之间的相关性。最后通过全连接层和Softmax层获取情感分类结果。在两个数据集上进行实验,实验结果表明该模型能够有效增强多模态方面级情感分类的效果。

关键词: 多模态方面级情感分析, 预训练模型, 多视图学习, 多模态交互, 动态融合