计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (14): 248-255.DOI: 10.3778/j.issn.1002-8331.2405-0312

• 图形图像处理 • 上一篇    下一篇

基于时空倍频程卷积模块的轻量级视频显著性预测模型

戴怡萱,韩冰,高新波,韩怡园   

  1. 1.西安电子科技大学 电子工程学院,西安 710071
    2.重庆邮电大学 图像认知重庆市重点实验室,重庆 400065
  • 出版日期:2025-07-15 发布日期:2025-07-15

Lightweight Video Saliency Prediction Model Driven by Spatio-Temporal Octave Convolution Module

DAI Yixuan, HAN Bing, GAO Xinbo, HAN Yiyuan   

  1. 1.School of Electronic Engineering, Xidian University, Xi’an 710071, China
    2.Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Online:2025-07-15 Published:2025-07-15

摘要: 视频显著性预测是模拟人眼关注点的重要任务,对于视频编辑、虚拟现实和自动驾驶等应用至关重要。传统方法依赖于大型网络,限制了在资源受限设备上的应用。为解决上述问题,提出一种轻量级网络,通过设计轻量化的时空多尺度倍频程卷积模块,减少参数和计算需求,保持性能的同时提高了效率。结果表明,轻量级网络在资源受限设备上取得了与传统方法相媲美甚至更好的性能,具有较低的计算开销和较快的推理速度,预测结果更符合真实的人类眼动行为。

关键词: 视频显著性预测, 深度学习, 轻量级模型, 3D卷积

Abstract: Video saliency prediction is an important task for modelling the human eye’s focus and is crucial for applications such as video editing, virtual reality and autonomous driving. Traditional methods rely on large networks, limiting applications on resource-constrained devices. To address these issues, a lightweight network is proposed, which reduces parameters and computational requirements by designing a lightweight spatio-temporal multi-scale octave convolution module to maintain performance while improving efficiency. Experimental results show that this lightweight network achieves comparable or even better performance than traditional methods on resource-constrained devices, with lower computational overhead and faster inference speed, and the prediction results are more consistent with real human eye movement behaviour.

Key words: video saliency prediction, deep learning, lightweight model, 3D convolution