计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (14): 297-306.DOI: 10.3778/j.issn.1002-8331.2404-0251

• 图形图像处理 • 上一篇    下一篇

面向超像素块级记忆学习的视频异常检测

谢斌红,王乾,张睿,张英俊,陆望东   

  1. 1.太原科技大学 计算机科学与技术学院,太原 030024
    2.山西天河云计算有限公司,山西 吕梁 033000
  • 出版日期:2025-07-15 发布日期:2025-07-15

Superpixel Block-Level Memory Learning Oriented Video Anomaly Detection

XIE Binhong, WANG Qian, ZHANG Rui, ZHANG Yingjun, LU Wangdong   

  1. 1.School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
    2.Shanxi Tianhe Cloud Computing Co., LTD., Lyuliang, Shanxi 033000, China
  • Online:2025-07-15 Published:2025-07-15

摘要: 针对当前视频异常检测模型中网络泛化能力过强,导致某些异常帧也能被很好预测的问题,提出一种面向超像素块级记忆学习的视频异常检测方法(superpixel block-level memory learning oriented video anomaly detection,SBM-VAD)。通过超像素分割算法SLIC调整特征图的分割粒度,以在实时原型记忆库(real-time prototype memory bank,RTPMB)中存储更精细的原型粒度,从而增强正常帧与异常帧在特征表示中的区分性。在U-Net预测网络的每层跳跃连接处引入一个与分割大小相匹配的实时原型记忆库,以避免模型学习输入到输出的恒等映射,有效约束传入解码器的特征。此外,采用一种基于软注意力策略的信号去噪模块(signal denoising module,SDM),引导模型优先学习预测前景区域,从而获得具有高质量前景的预测帧。在UCSD、CUHK Avenue和Shanghai Tech公开数据集上进行实验验证,结果表明改进后的模型在异常检测能力和效果方面均有显著提升。

关键词: 视频异常检测, 过泛化能力, 原型粒度, 恒等映射, 信号去噪

Abstract: Aiming at the problem that the generalization ability of the network is too strong in current video anomaly detection models, which leads to the accurate prediction of some abnormal frames, a method for superpixel block-level memory learning oriented video anomaly detection(SBM-VAD) is proposed. Firstly, by employing the superpixel segmentation algorithm SLIC to adjust the granularity of feature map segmentation, finer prototype granularity is stored in the real-time prototype memory bank (RTPMB), thereby enhancing the discriminative ability between normal frames and abnormal frames in feature representation. Secondly, a real-time prototype memory bank matching the segmentation size is introduced at each layer jump connection of the U-Net prediction network to avoid the model learning the input-to-output identity mapping and effectively constrain the incoming features to the decoder. In addition, a signal denoising module based on soft attention strategy is used to guide the model to preferentially learn to predict the foreground region, so as to obtain prediction frames with high-quality foreground. Finally, the experimental verification is carried out on UCSD, CUHK Avenue and ShanghaiTech public data sets, and the results show that the improved model has significantly improved the anomaly detection capability and effect.

Key words: video anomaly detection, overgeneralization ability, prototype granularity, identity mapping, signal denoising