改进重建和预测网络的人体异常行为检测方法

doi:10.3778/j.issn.1002-8331.2305-0210

摘要/Abstract

摘要： 在人体异常行为检测中，为了能够更加充分地利用动作和时空特征信息，提出了一种基于重建和预测网络的人体异常行为检测方法。该方法中的网络结构由重建子网络和视频预测子网络组成，其中重建子网络采用自编码器结构，以连续的视频帧作为输入来对下一帧进行重建；预测子网络采用基于3D卷积的编码器、解码器结构作为网络主干，通过输入一连串视频帧图片对后续视频帧进行预测。此外，为了能让重建子网络更好地关注人体行为的动作特征，采用詹森-香农散度（JSD）来计算重建帧与原始帧之间的差异，同时在预测子网络中添加时空一致性的正则化约束。UCSDped2、Avenue和ShanghaiTech三个数据集上的实验结果表明，该方法相比于其他的视频人体异常行为检测方法在AUC指标上有更好的表现，在UCSDped2、Avenue和ShanghaiTech数据集中分别达到了97.3%、91.1%和82.6%。

关键词: 异常行为检测, 自编码器, 3D卷积, 时空一致性

Abstract: In the detection of human abnormal behavior, in order to make full use of action and spatio-temporal feature information, a detection method of human abnormal behavior based on reconstruction and prediction network is proposed. The network structure in this method consists of a reconstruction sub-network and a video prediction sub-network, in which the reconstruction sub-network adopts a self-encoder structure and reconstructs the next frame with continuous video frames as input. The prediction sub-network adopts the encoder and decoder structure based on 3D convolution as the backbone of the network, and predicts the subsequent video frames by inputting a series of video frame pictures. In addition, in order to make the reconstructed sub-network pay more attention to the action characteristics of human behavior, Zhan Sen-Shannon divergence (JSD) is used to calculate the difference between the reconstructed frame and the original frame, and the regularization constraint of temporal and spatial consistency is added to the prediction sub-network. The experimental results on three datasets, UCSDped2, Avenue and ShanghaiTech, show that this method has better performance on AUC index than other video human abnormal behavior detection methods, and it reaches 97.3%, 91.1% and 82.6% in UCSDped2, Avenue and ShanghaiTech datasets respectively.

Key words: abnormal behavior detection, autoencoders, 3D convolution, spatiotemporal consistency

张红民, 庄旭, 郑敬添. 改进重建和预测网络的人体异常行为检测方法[J]. 计算机工程与应用, 2024, 60(17): 216-223.

ZHANG Hongmin, ZHAUNG Xu, ZHENG Jingtian. Improve Human Abnormal Behavior Detection Method of Reconstruction and Prediction Network[J]. Computer Engineering and Applications, 2024, 60(17): 216-223.

参考文献

[1] LU Y Y, XU Z, WANG J X. Abnormal behavior recognition system based on improved CRNN model[C]//Proceedings of the 2022 IEEE International Conference on Artificial Intelligence and Computer Applications, 2022: 420-424.
[2] NAIK A J, GOPALAKRISHNA M T. Deep-violence: individual person violent activity detection in video[J]. Multimedia Tools and Applications, 2021, 80(12): 18365-18380.
[3] HASAN M, CHOI J, NEUMANN J, et al. Learning temporal regularity in video sequences[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 733-742.
[4] 李自强, 王正勇, 陈洪刚, 等. 基于外观和动作特征双预测模型的视频异常行为检测[J]. 计算机应用, 2021, 41(10): 2997-3003.
LI Z Q, WANG Z Y, CHEN H G, et al. Abnormal video behavior detection based on dual prediction model of appearance and motion features[J]. Computer Application, 2021, 41(10): 2997-3003.
[5] PARK H, NOH J, HAM B. Learning memory-guided normality for anomaly detection[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, 2020: 14360-14369.
[6] ZHANG G, ETEMAD A. Holistic semi-supervised approaches for eeg representation learning[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, 2022: 1241-1245.
[7] SOHN K, BERTHELOT D, LI C L, et al. FixMatch: simplifying semi-supervised learning with consistency and confidence[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020: 596-608.
[8] SESHADRI K, AKIN B, LAUDON J, et al. An evaluation of edge TPU accelerators for Convolutional neural networks[C]//Proceedings of the 2022 IEEE International Symposium on Workload Characterization, 2022: 79-91.
[9] SONG L, ZHANG S, YU G, et al. TACNet: transition-aware context network for spatio-temporal action detection[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, 2019: 11979-11987.
[10] LIU S, LUO K, YE N, et al. OIFlow: occlusion-inpainting optical flow estimation by unsupervised learning[J]. IEEE Transactions on Image Processing, 2021, 30: 6420-6433.
[11] ZHAO J, SNOEK C. Dance with flow: two-in-one stream action detection[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, 2020: 9927-9936.
[12] ADAM A, RIVLIN E, SHIMSHONI I, et al. Robust real-time unusual event detection using multiple fixed-location monitors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(3): 555-560.
[13] XU D, RICCI E, YAN Y, et al. Learning deep representations of appearance and motion for anomalous event detection[J]. arXiv:1510.01553, 2015.
[14] WANG L, ZHOU F, LI Z, et al. Abnormal event detection in videos using hybrid spatio-temporal autoencoder[C]//Proceedings of the 2018 IEEE International Conference on Image Processing, 2018: 2276-2280.
[15] DEGARDIN B, NEVES J, LOPES V, et al. Generative adversarial graph Convolutional networks for human action synthesis[C]//Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, 2022: 2753-2762.
[16] LIU W, LUO W, LIAN D, et al. Future frame prediction for anomaly detection—a new baseline[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6536-6545.
[17] LEE S, KIM H G, RO Y M, et al. BMAN: bidirectional multi-scale aggregation networks for abnormal event detection[J]. IEEE Transactions on Image Processing, 2019, 29: 2395-2408.
[18] REISS T, HOSHEN Y. Attribute-based representations for accurate and interpretable video anomaly detection[J]. arXiv:2212.00789, 2022.
[19] LIU Z, NIE Y, LONG C, et al. A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction[C]//Proceedings of the 2021 IEEE International Conference on Computer Vision, 2021: 13568-13577.
[20] DONG F, ZHANG Y, NIE X. Dual discriminator generative adversarial network for video anomaly detection[J]. IEEE Access, 2021, 8: 88170-88176.
[21] KUMAR A, RAWAT Y S. End-to-end semi-supervised learning for video action detection[C]//Proceedings of the 2022 IEEE Conference on Computer Vision and Pattern Recognition, 2022: 14680-14690.
[22] DUARTE K, RAWAT Y S, SHAH M. VideoCapsuleNet: a simplified network for action detection[J]. arXiv:1805.08162, 2018.
[23] MAHADEVAN V, LI W X, BHALODIA V, et al. Anomaly detection in crowded scenes[C]//Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010: 1975-1981.
[24] LU C, SHI J, JIA J. Abnormal event detection at 150 FPS in MATLAB[C]//Proceedings of the 2014 IEEE International Conference on Computer Vision, 2014: 2720-2727.
[25] LUO W, WEN L, GAO S. A revisit of sparse coding based anomaly detection in stacked RNN framework[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 341-349.
[26] SIGURDSSON G A, AROL G V, WANG X, et al. Hollywood in homes: crowd sourcing data collection for activity understanding[J]. arXiv:1604.01753, 2016.
[27] DOSHI K, YILMAZ Y. Any-shot sequential anomaly detection in surveillance videos[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2020: 4037-4042.
[28] GEORGESCU M I, BARBALAU A, IONESCU R T, et al. Anomaly detection in video via self-supervised and multi-task learning[C]//Proceedings of the 2021 IEEE Computer Vision and Pattern Recognition, 2021: 12737-12747.
[29] KHAN F S, GEORGESCU M I, POPESCU M, et al. A background-agnostic framework with adversarial training for abnormal event detection in video[C]//Proceedings of the 2022 IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022: 4505-4523.
[30] ASTRID M, ZAHEER M Z, LEE S I. Synthetic temporal anomaly guided end-to-end video anomaly detection[C]//Proceedings of the 2021 IEEE International Conference on Computer Vision Workshops, 2021: 207-214.
[31] YU J, LEE Y, YOW K C, et al. Abnormal event detection and localization via adversarial event prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(8): 3572-3586.
[32] PARK C, CHO M A, LEE M, et al. FastAno: fast anomaly detection via spatio-temporal patch transformation[C]//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022: 1908-1918.