计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (22): 114-125.DOI: 10.3778/j.issn.1002-8331.2307-0431

• 模式识别与人工智能 • 上一篇    下一篇

跨模态语义对齐和信息细化的多模态情感分析

丁美荣,陈鸿业,曾碧卿   

  1. 华南师范大学 软件学院,广东 佛山 528225
  • 出版日期:2024-11-15 发布日期:2024-11-14

Cross-Modal Semantic Alignment and Information Refinement for Multi-Modal Sentiment Analysis

DING Meirong, CHEN Hongye, ZENG Biqing   

  1. School of Software, South China Normal University, Foshan, Guangdong 528225, China
  • Online:2024-11-15 Published:2024-11-14

摘要: 为了解决多模态情感分析中存在异构鸿沟和语义鸿沟,以及模态无法有效融合等问题,提出了一个新的框架,基于跨模态Transformer的语义对齐和信息细化的多模态情感分析模型CM-SAIR(cross-modal semantic alignment and information refinement for multi-modal sentiment analysis),可以有效地解决多模态语义不对齐、语义噪声等问题,实现多模态数据更好地交互融合。使用多模态特征嵌入模块(multi-modal feature embedding,MFE)增强视觉和听觉模态的情感信息。通过一个定义良好的模态间语义对齐模块(inter-modal semantic alignment,ISA)进行双模态时间维度的对齐。通过一个模态内的信息细化模块(intra-modal information refinement,IIR)进行情感解析和情感细化。通过多模态门控融合模块(multi-modal gated fusion,MGF)实现模态的有效融合。在流行的多模态情感分析数据集上进行实验,证明了CM-SAIR框架与最先进的基线相比的优势。

关键词: 多模态特征嵌入, 语义对齐, 信息细化, 多模态门控融合, 多模态情感分析

Abstract: In order to solve the problems of heterogeneous gap, semantic gap and inability to effectively fuse modalities in multi-modal sentiment analysis, this paper proposes a new framework, a multi-modal sentiment analysis model CM-SAIR based on cross-modal Transformer for semantic alignment and information refinement, which can effectively solve problems such as multi-modal semantic misalignment and semantic noise, and achieve better interactive fusion of multi-modal data. Multi-modal feature embedding module (MFE) is used to enhance the emotional information of visual and audio modalities. A well-defined inter-modal semantic alignment module (ISA) is proposed for bimodal temporal dimensions alignment. Sentiment parsing and sentiment refinement are performed through an intra-modal information refinement module (IIR). Effective modal fusion is achieved through the multi-modal gated fusion module (MGF). Extensive experiments on popular multi-modal sentiment analysis datasets demonstrate the advantages of the CM-SAIR framework over state-of-the-art baselines.

Key words: multi-modal feature embedding, semantic alignment, information refinement, multi-modal gated fusion, multi-modal sentiment analysis