计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (19): 137-146.DOI: 10.3778/j.issn.1002-8331.2406-0376

• 模式识别与人工智能 • 上一篇    下一篇

多通道交互下全局语义信息增强的多模态情感分析

卜韵阳,卜凡亮,张志江   

  1. 1.中国人民公安大学 信息网络安全学院,北京 100038
    2.公安部第一研究所,北京 100048
  • 出版日期:2025-10-01 发布日期:2025-09-30

Multimodal Sentiment Analysis of Global Semantic Information Enhancement Under Multi-Channel Interaction

BU Yunyang, BU Fanliang, ZHANG Zhijiang   

  1. 1.College of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China
    2.First Research Institute of the Ministry of Public Security, Beijing 100048, China
  • Online:2025-10-01 Published:2025-09-30

摘要: 人类在沟通时常常会通过文本、音频和视觉等多种形式表达情感。如果只使用单一的方式判断情感,结果可能会有偏差,但结合多种线索可以更加全面地理解和探索信息。然而,之前的大多数多模态情感分析方法只是分析单个图文对帖子之间的情感联系,而忽略了数据集中每个图文对帖子之间的共现特征。针对上述问题,提出一种多通道交互下全局语义信息增强的多模态情感分析模型。设计一个文本引导的多通道交互模块,促进单个图文对中文本特征与图像对象视图和场景视图之间的交互;构建文本级图神经网络和文本属性级图神经网络学习单个模态和多个模态的全局共现特征;利用一个多源表征模块融合多种特征表示实现多模态融合。在公开的多模态情感分析数据集MVSA-Single、MVSA-Multiple和TumEmo上的大量实验证明,该模型优于一系列基线模型。

关键词: 多模态情感分析, 多通道交互, 图神经网络, 信息增强

Abstract: Humans often express emotion in multiple forms when communicating, including text, audio, and visuals. If only one modality is used to determine sentiment, the results may be biased; however, combining multiple cues allows for a more comprehensive understanding and exploration of the message. However, most previous multimodal sentiment analysis methods only examined the sentiment links between individual graphic pairs of posts, ignoring the co-occurring features within each graphic pair of posts in the dataset. To address the above problems, a multimodal sentiment analysis model for global semantic information enhancement under multi-channel interaction is proposed. Firstly, a text-guided multi-channel interaction module is designed to facilitate the interaction between text features in a single graphic pair and the image object view and scene view. Secondly, a text-level graph neural network and a textual attribute-level graph neural network are constructed to learn the global co-occurrence features of individual modalities and multiple modalities. Finally, a multisource representation module is utilized to fuse multiple feature representations to achieve multimodal fusion. Extensive experiments on the publicly available multimodal sentiment analysis datasets MVSA-Single, MVSA-Multiple, and TumEmo demonstrate that the model outperforms a range of baseline models.

Key words: multimodal sentiment analysis, multichannel interaction, graph neural networks, information enhancement