Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (1): 165-173.DOI: 10.3778/j.issn.1002-8331.2207-0498

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Cross-Modal Emotion Analysis of Semantic and Spatio-Temporal Dynamic Interaction

QU Licheng, QIE Liyuan, LIU Zijun, WEI Si, DONG Zhewei   

  1. School of Information Engineering, Chang’an University, Xi’an 710064, China
  • Online:2024-01-01 Published:2024-01-01



  1. 长安大学 信息工程学院,西安 710064

Abstract: Considering the problems of poor interaction between multimodal and low fusion of spatial and temporal features in traditional sentiment analysis, a semantic and spatio-temporal dynamic interaction network of cross-modal is proposed. By introducing bi-directional long short-term memory, the time series features of each modality are mined. Meanwhile, a self-attention mechanism is added to strengthen the weight distribution of features within the modality, and the automatically screened feature matrix is sent to the graph convolutional neural networks for semantic interaction. Then, based on the timestamp, the feature aggregation is carried out, the correlation coefficient of the aggregation layer is calculated, and the fused features are obtained to realize cross-modal space interaction. Finally the classification and prediction of emotional polarity are performed. The proposed model is evaluated and verified using public datasets. The experimental results show that multi-modal time series extraction and cross-modal semantic space interaction mechanism can achieve full dynamic fusion of intra-modal and inter-modal features, and effectively improve the accuracy and F1 value of sentiment classification. On the CMU-MOSEI dataset they have increased by 1.7%~13.5% and 2.1%~14.0% respectively, showing good robustness and advancement.

Key words: cross modal sentiment analysis, semantic interaction, spatio-temporal interaction, bi-directional long short-term memory, graph convolutional network

摘要: 针对传统情感分析中存在的模态间交互性差、时空特征融合度低的问题,建立了一种跨模态的语义时空动态交互网络。通过引入双向长短期记忆网络挖掘各模态的时间序列特征,加入自注意力机制强化模态内特征的权重赋值,将自动筛选出的特征矩阵送入图卷积神经网络进行语义交互。然后以时间戳为基础进行特征聚合,计算聚合层的相关系数,获得融合后的联合特征,实现跨模态空间交互,最终完成情感极性的分类与预测。使用公开数据集对所提出的模型进行评估验证,实验结果表明,多模态时间序列提取和跨模态语义空间交互机制可以实现模态内和模态间特征的全动态融合,有效地提升了情感分类的准确率和F1值,在CMU-MOSEI数据集上分别提高了1.7%~13.5%和2.1%~14.0%,表现出良好的健壮性和先进性。

关键词: 跨模态情感分析, 语义交互, 时空交互, 双向长短期记忆网络, 图卷积网络