计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (11): 180-187.DOI: 10.3778/j.issn.1002-8331.2203-0036

• 模式识别与人工智能 • 上一篇    下一篇

基于图像语义翻译的图文融合情感分析方法

黄健,王颖   

  1. 西安科技大学 通信与信息工程学院,西安 710600
  • 出版日期:2023-06-01 发布日期:2023-06-01

Image-Text Fusion Sentiment Analysis Method Based on Image Semantic Translation

HUANG Jian, WANG Ying   

  1. College of Communication and Information Technology, Xi’an University of Science and Technology, Xi’an 710600, China
  • Online:2023-06-01 Published:2023-06-01

摘要: 多模态情感分析问题中,图像在不同情况或者对其关注点不同会产生不同的情感,为了解决图像语义理解问题,提出了基于图像语义翻译的图文融合情感分析(ImaText-IST)方法。将图像送入图像翻译模块将其翻译为图像描述,该模块融入了不同的情感表达来进行图像描述捕捉,分别生成积极、中性和消极三个情感极性的图像描述。通过三个情感极性的图像描述和数据集中的文本进行情感相关性分析,从而使得对图像情感理解更加准确。将图像语义描述、目标以及文本进行情感预测,分别采用特征融合及辅助语句的方式进行情感分析。实验结果表明,辅助语句的方式(Axu-ImaText-IST)能更好地理解图文的情感,在社交情感媒体数据集Twitter-15和Twitter-17的Accuracy和Macro-F1均高于基准模型。

关键词: 图文融合, 多模态情感分析, 图像描述, 情感相关性

Abstract: In multimodal sentiment analysis, images will generate different emotions under different circumstances or at different attention points. In order to solve problems related to image semantic understanding, it proposes a method for  image-text fusion of sentiment analysis based on image semantic translation(ImaText-IST). For a start, images are transmitted to image translation module to translate them into image captions. The module is integrated with different emotional expressions to capture image captions and generate image captions based on such three emotional polarities as positive, neutral and negative. Then, emotional correlation analysis is conducted based on the texts in the image captions at the aforesaid three emotional polarities as well as datasets to improve the accuracy of image semantic understanding. At last, sentiment prediction is performed based on image semantic captions, targets and texts, and sentiment analysis is conducted with feature fusion and auxiliary sentences. The results show that auxiliary sentences(Axu-ImaText-IST) can better understand the emotions of images and texts. The accuracy and Macro-F1 of social media datasets Twitter-15 and Twitter-17 are both higher than that of the benchmark model.

Key words: image-text fusion, multimodal sentiment analysis, image caption, emotional correlation