计算机工程与应用 ›› 204, Vol. 60 ›› Issue (17): 216-223.DOI: 10.3778/j.issn.1002-8331.2305-0210

• 图形图像处理 • 上一篇    下一篇

改进重建和预测网络的人体异常行为检测方法

张红民,庄旭,郑敬添   

  1. 重庆理工大学 电气与电子工程学院,重庆 400054
  • 出版日期:2024-09-01 发布日期:2024-08-30

Improve Human Abnormal Behavior Detection Method of Reconstruction and Prediction Network

ZHANG Hongmin, ZHAUNG Xu, ZHENG Jingtian   

  1. School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China
  • Online:2024-09-01 Published:2024-08-30

摘要: 在人体异常行为检测中,为了能够更加充分地利用动作和时空特征信息,提出了一种基于重建和预测网络的人体异常行为检测方法。该方法中的网络结构由重建子网络和视频预测子网络组成,其中重建子网络采用自编码器结构,以连续的视频帧作为输入来对下一帧进行重建;预测子网络采用基于3D卷积的编码器、解码器结构作为网络主干,通过输入一连串视频帧图片对后续视频帧进行预测。此外,为了能让重建子网络更好地关注人体行为的动作特征,采用詹森-香农散度(JSD)来计算重建帧与原始帧之间的差异,同时在预测子网络中添加时空一致性的正则化约束。UCSDped2、Avenue和ShanghaiTech三个数据集上的实验结果表明,该方法相比于其他的视频人体异常行为检测方法在AUC指标上有更好的表现,在UCSDped2、Avenue和ShanghaiTech数据集中分别达到了97.3%、91.1%和82.6%。

关键词: 异常行为检测, 自编码器, 3D卷积, 时空一致性

Abstract: In the detection of human abnormal behavior, in order to make full use of action and spatio-temporal feature information, a detection method of human abnormal behavior based on reconstruction and prediction network is proposed. The network structure in this method consists of a reconstruction sub-network and a video prediction sub-network, in which the reconstruction sub-network adopts a self-encoder structure and reconstructs the next frame with continuous video frames as input. The prediction sub-network adopts the encoder and decoder structure based on 3D convolution as the backbone of the network, and predicts the subsequent video frames by inputting a series of video frame pictures. In addition, in order to make the reconstructed sub-network pay more attention to the action characteristics of human behavior, Zhan Sen-Shannon divergence (JSD) is used to calculate the difference between the reconstructed frame and the original frame, and the regularization constraint of temporal and spatial consistency is added to the prediction sub-network. The experimental results on three datasets, UCSDped2, Avenue and ShanghaiTech, show that this method has better performance on AUC index than other video human abnormal behavior detection methods, and it reaches 97.3%, 91.1% and 82.6% in UCSDped2, Avenue and ShanghaiTech datasets respectively.

Key words: abnormal behavior detection, autoencoders, 3D convolution, spatiotemporal consistency