计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (22): 148-158.DOI: 10.3778/j.issn.1002-8331.2411-0130

• 模式识别与人工智能 • 上一篇    下一篇

ADMIC:基于交替主导模态内在相关性的多模态隐喻检测

郭世松,杨启萌,闫远播,何晓宇   

  1. 1.新疆大学 软件学院,乌鲁木齐 830099
    2.新疆大学 计算机科学与技术学院,乌鲁木齐 830017
  • 出版日期:2025-11-15 发布日期:2025-11-14

ADMIC: Multimodal Metaphor Detection Based on Intrinsic Correlation of Alternating Dominant Modalities

GUO Shisong, YANG Qimeng, YAN Yuanbo, HE Xiaoyu   

  1. 1.School of Software, Xinjiang University, Urumqi 830099, China
    2.School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China
  • Online:2025-11-15 Published:2025-11-14

摘要: 多模态隐喻检测依赖非文本模态的信息补充文本模态,而当前检测模型往往过于关注文本模态,忽略了视觉模态在捕捉隐喻语义中的关键作用,并且缺乏有效的模态交互融合策略。为解决上述问题,提出了一种基于交替主导模态内在相关性的多模态深度神经网络模型(alternating dominant modalities intrinsic correlation,ADMIC)。该模型在文本模态和图像模态间交替指定主导模态,通过调整主导模态的方式,充分挖掘两种模态的内在关联与互补特性,实现隐喻信息的双向交互融合。与固定主导模态的融合方法相比,交替主导模态策略能够更灵活地适应隐喻语义在不同模态间的分布,提高了隐喻检测的性能和泛化能力。设计了多级融合策略,进一步提升了模态融合的效果。在Met-meme数据集上的实验结果表明,ADMIC在英文、中文及双语数据集上的F1-score分别提升了1.32、3.15和2.29个百分点,显著优于传统方法。实验结果验证了该模型在捕捉隐喻语义方面的优势,同时在多模态讽刺数据集上的良好表现也证明了其泛化能力。

关键词: 多模态, 隐喻检测, 特征融合, 注意力机制, Met-meme

Abstract: Multimodal metaphor detection relies on non-textual modalities to supplement textual modalities, while current detection models often focus too much on textual modalities, neglecting the crucial role of visual modalities in capturing metaphorical semantics, and lacking effective modal interaction and fusion strategies. To address the aforementioned issues, this paper proposes a multimodal deep neural network model based on alternating dominant modalities intrinsic correlation (ADMIC). This model alternately specifies the dominant modality between the text modality and the image modality, and by adjusting the dominant modality, fully explores the inherent correlation and complementary characteristics of the two modalities, achieving bidirectional interaction and fusion of metaphorical information. Compared with the fusion method of fixed dominant modalities, the alternating dominant modality strategy can more flexibly adapt to the distribution of metaphorical semantics between different modalities, improving the performance and generalization ability of metaphor detection. This paper proposes a multi-level fusion strategy to further enhance the effectiveness of modal fusion. Experiments on the Met-meme dataset show that ADMIC improves F1-score by 1.32, 3.15, and 2.29 percentage points on English, Chinese, and bilingual datasets, respectively, significantly outperforming traditional methods. The experimental results validate the advantages of the model in capturing metaphorical semantics, and its good performance on multimodal satire datasets also demonstrates its generalization ability.

Key words: multimodal, metaphor detection, feature fusion, attention mechanism, Met-meme