计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (22): 172-183.DOI: 10.3778/j.issn.1002-8331.2309-0082

• 模式识别与人工智能 • 上一篇    下一篇

结合多粒度视图动态融合的多模态方面级情感分析

杨颖,钱馨雨,王合宁   

  1. 1.合肥工业大学 管理学院,合肥 230009
    2.过程优化与智能决策教育部重点实验室,合肥 230009
    3.智能决策与信息系统技术教育部工程研究中心,合肥 230009
  • 出版日期:2024-11-15 发布日期:2024-11-14

Multimodal Aspect-Level Sentiment Analysis Based on Multi-Granularity View Dynamic Fusion

YANG Ying, QIAN Xinyu, WANG Hening   

  1. 1.School of Management, Hefei University of Technology, Hefei 230009, China
    2.Key Laboratory of Process Optimization and Intelligent Decision-Making, Ministry of Education, Hefei 230009, China
    3.Ministry of Education Engineering Research Centre for Intelligent Decision-Making & Information System Technologies, Hefei 230009, China
  • Online:2024-11-15 Published:2024-11-14

摘要: 为了解决以往多模态方面级情感分析研究中存在的特征提取不充分、数据噪声未被有效处理以及多模态数据中的复杂交互被忽视等问题,提出了一种多粒度视图动态融合模型(multi-granularity view dynamic fusion model,MVDFM)。从粗粒度和细粒度两个视角,对文本和图像数据进行向量化编码,以便充分捕捉数据特征,增强模型信息表达能力;提取文本、图像的多粒度视图特征,并设计动态门控自注意力机制,对细粒度级的文本、图像视图进行降噪,进一步保证特征提取质量;为了挖掘不同粒度上多视图之间的互补性和一致性,提出一种三视图分解高阶池化机制,对多粒度视图特征进行两阶段动态融合,得到最终的目标方面词情感极性。实验结果表明,该模型在公共数据集Twitter-2015和Twitter-2017上的准确率和F1值分别达到了78.69%、74.48%以及72.77%、71.61%,相较于最优基线模型分别提升了0.55、0.88个百分点,以及1.67、2.45个百分点。说明该方法能够充分利用多模态数据中包含的深层语义信息,并有效挖掘与目标方面词相关的重要信息,从而提高方面级情感预测效果。

关键词: 多模态方面级情感分析, 动态门控注意力, 多粒度视图, 动态融合

Abstract: To solve the problems of inadequate feature extraction, low data information utilization, and ignoring the complex interaction in multimodal data for aspect sentiment analysis, a multi-granularity view dynamic fusion model (MVDFM) is proposed. Firstly, text and image data are encoded from two perspectives of coarse-grained and fine-grained, so as to fully capture data features and enhance the information representation ability of the model. Secondly, multi-granularity view features of text and image are extracted, and dynamic gated self-attention mechanism is designed to reduce the noise of fine-grained text and image views to further ensure the quality of feature extraction. Finally, in order to excavate the complementarity and consistency between multiple views at different granularity, a triple-view factorized bilinear pooling mechanism is proposed to carry out two-stage dynamic fusion of multi-granularity view features to obtain the final target aspect sentiment polarity. The experimental results show that the accuracy and F1 values of the model on the public data sets Twitter-2015 and Twitter-2017 reach 78.69% and 74.48%, and 72.77% and 71.61%, respectively. Compared with the best baseline model, the improvement is 0.55, 0.88 percentage points and 1.67, 2.45 percentage points, respectively. This method can make full use of the information contained in the multimodal data, and effectively mine the key parts related to the target aspect words to improve the effect of aspect-level emotion prediction.

Key words: multimodal aspect-level sentiment analysis, dynamic gated attention, multi-granularity views, dynamic fusion