计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (4): 174-178.DOI: 10.3778/j.issn.1002-8331.1707-0347

• 模式识别与人工智能 • 上一篇    下一篇

多维度自适应3D卷积神经网络原子行为识别

高大鹏,朱建刚   

  1. 中国民航飞行学院 计算机学院,四川 广汉 618307
  • 出版日期:2018-02-15 发布日期:2018-03-07

Atom action recognition by multi-dimensional adaptive 3D convolutional neural networks

GAO Dapeng, ZHU Jiangang   

  1. School of Computer, Civil Aviation Flight University of China, Guanghan, Sichuan 618307, China
  • Online:2018-02-15 Published:2018-03-07

摘要: 针对现有的3D卷积神经网络(3D Convolutional Neural Networks,3DCNN)行为识别算法将输入视频分块划分为固定长度,其包含的行为信息可能冗余或不全的问题,提出了解决方案。利用人体运动质点轨迹的特性定义了人体原子行为;以原子行为的长度作为视频分块的长度进行视频划分,得到包含完整信息的人体行为。3DCNN要求输入数据必须是相同维度,而原子行为视频块长度不同。为此改进了空间金字塔池化(3D Spatial Pyramid Pooling,3D SPP)技术,以适用于不同长度视频处理。把SPP层放置在全连接层前,处理3DCNN卷积层输出的不同长度特征图,以输出相同长度特征向量。与相关算法相比,实验数据说明该算法对输入数据要求更低,由于视频分块信息的完整性,识别率有显著提高。

关键词: 行为识别, 视频分析, 3D空间金字塔池化, 原子行为, 3D卷积神经网络

Abstract: A novel action recognition algorithm is proposed for 3D Convolutional Neural Networks(3DCNN)’s disadvantage that demands a fixed length for all video clips as the input data. This disadvantage makes lack of information or data redundancy situation because of the fixed size video clips. Firstly, human atom action is defined by human action particle trajectory. Then the length of video clips is defined by the length of human atom action. The divided video clips include unabridged information for a human action. However, the length of these clips is different. There is a conflict for classification and identification in 3DCNN, because 3DCNN needs the same length of input data. To solve the problem, 3D Spatial Pyramid Pooling(SPP) algorithm is improved for processing different length video data. 3D SPP, which is put before fully-connected layers in 3DCNN, outputs the same size representation vectors. This technology is compared with several related algorithms in experiments. The experimental results show that there are two advantages in this technology: a lower requirement for input data and higher recognition rate with a intact information in clips.

Key words: action recognition, video analysis, 3D spatial pyramid pooling, atom action, 3D Convolutional Neural Networks