Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (1): 210-216.DOI: 10.3778/j.issn.1002-8331.1607-0105

Previous Articles     Next Articles

Discriminative spatio-temporal pyramid compact representations algorithm

CUI Xuehong, LIU Yun, WANG Chuanxu, LI Hui   

  1. College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao, Shandong 266061, China
  • Online:2018-01-01 Published:2018-01-15


崔雪红,刘  云,王传旭,李  辉   

  1. 青岛科技大学 信息科学技术学院,山东 青岛 266061

Abstract: In Spatio-Temporal Pyramid Representation(STPR), the video is divided into a series of increasingly finer cubic unit cells on each pyramid level. Local features are extracted from all of the cubic unit cells and are concatenated to rebuild a high dimensional feature vector. As a result, when the samples are trained and tested, high computational costs are required. Moreover, because the partitioning strategy for the video is divided into parts which is designed by hand, there is poor theoretical evidence for the optimal partitioning strategy for good behavior recognition. This paper proposes discriminative STPR, which is a new representation that constructs the video feature as a weighted sum of semi-local features over all pyramid levels. This weights are automatically calculated by using partial least square method to maximize a discriminative power. The resulting representation is compact and reserves high discriminative power. Furthermore, this representation can reveal the distinctive cubic unit cells and the number of pyramid level simultaneously by observing the optimal weights of cubic unit cells generated from the fine cubic unit cells.

Key words: Spatio-Temporal Pyramid Representation(STPR), Partial Least Square(PLS) method, sparse coding, pooling, low-level feature

摘要: 当传统时空金字塔层数较多时,特征描述符的维数会非常高,使得此类描述符在训练和测试阶段计算效率非常低。此外,在时空金字塔的分层及每层立方体单元的划分中,至今仍然采用手动划分视频,使得视频划分策略没有强的理论依据。鉴于以上缺点,提出一种高显著性的时空金字塔精简描述符算法。形成的新描述符是所有时空金字塔层中每个立方体单元局部特征的权重和,而不是把所有立方体单元局部特征描述符串联起来形成一个巨大的特征描述符,每个立方体单元的权重可以通过偏最小二乘法自动获取,由此产生的视频全局描述符精简并且具有高的显著性。此外通过观测其精细立方体单元的权重,还可以展现出显著性时空金字塔每个立方体单元及每层金字塔的贡献,由此,可以根据权重自动划分视频。采用HMDB51和YouTube两个动作数据库进行实验验证,与时空金字塔描述符和超稀疏编码向量相比,此描述符精简并能在低维度下取得较好的识别效果。

关键词: 时空金字塔, 偏最小二乘法, 稀疏编码, 池化技术, 底层特征