Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (1): 162-168.DOI: 10.3778/j.issn.1002-8331.2106-0376

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Research on Temporal None Padding Network Video Action Recognition Algorithm

LIU Zhao, YANG Fan, SI Yazhong   

  1. School of Electronic and Information Engineering, Hebei University of Technology, Tianjin 300401, China
  • Online:2023-01-01 Published:2023-01-01

时域非填充网络视频行为识别算法研究

刘钊,杨帆,司亚中   

  1. 河北工业大学 电子信息工程学院,天津 300401

Abstract: Video action recognition is a basic problem in the field of image and vision. For the action recognition model based on deep learning, 2D convolution method has few model parameters, but its accuracy is not high. 3D method improves the accuracy but will produce more parameters and computation. In order to reduce the amount of parameters of 3D convolutional neural network action recognition model and reduce the consumption on the premise of maintaining the accuracy, a temporal none padding convolutional network algorithm is proposed. When 3D convolution is performed on video, additional data will not fill in the time dimension, so as to ensure the integrity of temporal information. In order to make full use of the time information, a network structure suitable for this filling method is designed. Firstly, 3D convolution is used to extract the space-time information without padding in the time dimension, and then 3D convolution is transformed into 2D convolution to further extract features by using the network reorganization structure. The experimental results show that the parameters of this network are 10.385×106, and the accuracy is 60.28% on UCF101 dataset without using pretrained weights. Compared with other 3D convolution network action recognition methods, this network has obvious advantages in resource occupation and accuracy. 

Key words: action recognition, machine vision, deep learning, 3D convolutional neural network

摘要: 视频行为识别是图像和视觉领域的一个基础问题,在基于深度学习的行为识别模型中,2D卷积方法模型参数较少,但是准确率不高;3D卷积方法在一定程度上提高了准确率,但会产生较多的参数和计算量。为了在保持准确率的前提下降低3D卷积神经网络行为识别模型的参数量,减少计算资源消耗,提出了时域零填充卷积网络行为识别算法,对视频进行3D卷积时不在时间维度上填充额外数据,以此来保证时域信息的完整性。为了充分利用有限的时间信息,设计了适合此填充方式的网络结构:先以时域不填充的方式使用3D卷积提取时空信息,然后利网络重组结构将3D卷积变为2D卷积来进一步提取特征。实验表明,该网络的参数量为10.385×106,不使用预训练权重的情况下在UCF101数据集上准确率为60.28%,与其他3D卷积网络行为识别方法相比在资源占用和准确率上都有明显优势。

关键词: 行为识别, 机器视觉, 深度学习, 3D卷积神经网络