计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (23): 110-125.DOI: 10.3778/j.issn.1002-8331.2510-0100

• 理论与研发 • 上一篇    下一篇

基于多模态情感数据的网络视频满意度分析方法

王安启,李明轩,程泊宣   

  1. 中国人民公安大学 侦查学院,北京 100038
  • 出版日期:2025-12-01 发布日期:2025-12-01

Method for Analyzing Satisfaction with Online Videos Based on Multimodal Emotional Data

WANG Anqi, LI Mingxuan, CHENG Boxuan   

  1. School of Criminal Investigation, People’s Public Security University of China , Beijing 100038, China
  • Online:2025-12-01 Published:2025-12-01

摘要: 随着互联网和视频平台的快速发展,网络视频内容日益多样,如何有效评估用户对不同类型网络视频的满意度成为视频内容推广和人机交互研究领域的关键问题。尽管融合文本、语音和视觉信息的多模态情感分析方法已被广泛应用于用户情绪识别,但情绪状态并不能完全反映用户对内容的综合体验。现有研究往往仅停留在情感极性的建模,缺乏对情绪与满意度之间关联机制的探讨,导致满意度这一高阶心理结构长期被忽视。为了更加准确地评估用户对于网络视频的综合情感,提出了基于多模态融合的视频满意度分析框架——MVSA(multimodal video satisfaction analysis),同时,构建了一个针对网络视频用户满意度研究的多模态数据集MVS-Eval(multimodal video satisfaction evaluation),涵盖了吸引力、专注度、参与度等多维度满意度标签,旨在全面建模用户对视频内容的主观反馈,进一步提出了基于模态一致性训练和满意度引导融合机制的多模态满意度估计算法MUSE(multimodal understanding for satisfaction estimation),有效建立情绪-满意度链路,并提升了模型的满意度指标预测性能与跨场景泛化能力。此外,MVSA框架集成了一个智能反馈处理平台,能够自动解析用户反馈视频并生成结构化的满意度评估结果。实验结果表明,MUSE在多个基准任务中显著优于现有主流模型,验证了其在多类型网络视频满意度建模中的有效性与可解释性。

关键词: 网络视频, 多模态数据, 满意度分析

Abstract: With the rapid development of the internet and video platforms, online video content has become increasingly diverse. Effectively evaluating user satisfaction with different types of online videos has emerged as a critical issue in video content promotion and human-computer interaction research. Although multimodal sentiment analysis methods integrating text, audio, and vision information have been widely applied to user emotion recognition, emotional states alone cannot fully reflect users’ comprehensive experience of content. Existing research often remains confined to modeling affect polarity, neglecting the underlying mechanisms linking emotion to satisfaction. This has led to the long-term oversight of satisfaction as a higher-order psychological construct. To more accurately assess users’ holistic emotional responses to online videos, the paper proposes MVSA (multimodal video satisfaction analysis), a video satisfaction analysis framework based on multimodal fusion. Concurrently, the paper establishes MVS-Eval (multimodal video satisfaction evaluation), the first multimodal dataset specifically designed for online video user satisfaction research. This dataset encompasses satisfaction tags across multiple dimensions, including attractiveness, concentration, and engagement. This aims to comprehensively model users’ subjective feedback on video content. Furthermore, the paper proposes the multimodal satisfaction estimation algorithm MUSE (multimodal understanding for satisfaction estimation), based on modality consistency training and satisfaction-guided fusion mechanisms. This effectively establishes the emotion-satisfaction link and enhances the model’s satisfaction metric prediction performance and cross-scenario generalization capability. Additionally, the MVSA framework integrates an intelligent feedback processing platform that automatically parses user feedback videos and generates structured satisfaction evaluation results. Experimental results demonstrate that MUSE significantly outperforms existing mainstream models across multiple benchmark tasks, validating its effectiveness and interpretability in modeling satisfaction for diverse online video types.

Key words: online video, multimodal data, satisfaction analysis