Transferable Dictionary Learning Fused Data Augmentation

doi:10.3778/j.issn.1002-8331.2001-0041

Abstract

Abstract:

A transferable dictionary method is proposed to solve the problem that insufficient label samples in complex behavior dataset. The proposed method uses simple action as the source domain to assist in identifying complex action composed of a series of simple actions. The low-level features of video are extracted by dense trajectory, and then the sparse representation of simple action and complex action are obtained by dictionary learning, and the sparse representation of simple action is used to improve the sparse representation of complex action by transformation matrix. Therefore, even in the case of fewer complex action labeled data, the transferable dictionary can obtain more efficient features. At the same time, GAN is used to data augmentation at the feature level, which helps to learn the dictionary with stronger representation ability. The proposed method is tested on UCF101 and HMDB51 dataset, and obtains better recognition results than the existing method in the case of small sample size, which proves the effectiveness of the method.

Key words: complex action recognition, transferable dictionary, feature augmentation

摘要：

提出利用迁移字典解决复杂行为数据集标签样本不足的问题。所提出的方法使用简单行为作为源域，来辅助识别由一系列简单行为组成的复杂行为。通过稠密轨迹提取视频的低级特征，利用字典学习从简单行为和复杂行为的低级特征中分别获得相应的稀疏表示，并利用简单行为的稀疏表示通过迁移矩阵改善复杂行为的稀疏表示。因此，即使在复杂行为标签样本较少的情况下，迁移字典也能够获得更有效的高级特征。同时，利用GAN在特征层面上进行数据增强，帮助学习表征能力更强的字典。提出的方法在UCF101和HMDB51两个数据上进行了实验，在小样本量的情况下获得了比现有方法更好的识别结果，证明了方法的有效性。

关键词: 复杂行为识别, 迁移字典, 特征增强

WANG Ziru, LI Zhenmin. Transferable Dictionary Learning Fused Data Augmentation[J]. Computer Engineering and Applications, 2021, 57(23): 193-199.

王子儒，李振民. 融合数据增强的迁移字典学习[J]. 计算机工程与应用, 2021, 57(23): 193-199.

References

[1] ZHANG Hongbo，ZHANG Yixiang，ZHONG Bineng.A comprehensive survey of vision-based human action recognition methods[J].Sensors，2019，19（5）：1-20.
[2] GKIOXARI G，GIRSHICK R，DOLLáR P.Detecting and recognizing human-object interactions[C]//Conference on Computer Vision and Pattern Recognition，Salt Lake City，UT，2018：8359-8367.
[3] WANG Heng，SCHMID C.Action recognition with improved trajectories[C]//2013 IEEE International Conference on Computer Vision，Sydney，2014：3551-3558.
[4] ZHANG B W，WANG L M，WANG Z.Real-time action recognition with deeply transferred motion vector CNNs[J].IEEE Trans Image Process，2018，27：2326-2339.
[5] ZHU Jiagang，ZHU Zheng，ZOU Wei.End-to-end video-level representation learning for action recognition[C]//2018 International Conference on Pattern Recognition，2018：645-650.
[6] XU Tiantian，FAN Zhu，EDWARD K W，et al.Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition[J].Image and Vision Computing，2016，55（2）：127-137.
[7] ZHU F，SHAO L.Weakly-supervised cross-domain dictionary learning for visual recognition[J].International Journal of Computer Vision，2014，109（1/2）：42-59.
[8] GOODFELLOW I，POUGET-ABADIE J，MIRZA M.Gene-rative adversarial nets[C]//Conference on Neural Information Processing Systems，Montreal，Canada，2014.
[9] MASAKI S，EIICHI M，SHUNTA S.Temporal generative adversarial nets with singular value clipping[C]//Proceedings of the IEEE International Conference on Computer Vision（ICCV），2017：2830-2839.
[10] MAHDAVI S，SHIRI M E，RAHNAMAYAN S.Metaheuristics in large-scale global continues optimization：a survey[J].Information Sciences，2015，295：407-428.
[11] SADANAND S，CORSO J J.Action bank：a high-level representation of activity in video[C]//IEEE Conference on Computer Vision and Pattern Recognition，2012：1234-1241.
[12] AHARON M，ELAD M，BRUCKSTEIN A.K-SVD：an algorithm for designing overcomplete dictionaries for sparse representation[J].IEEE Transactions on Signal Processing，2006，54（11）：4311-4322.
[13] TEBOULLE M，BECK A.A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J].SIAM Journal on Imaging Sciences，2009，2（1）：183-202.
[14] RUDER S.An overview of gradient descent optimization algorithms[J].arXiv：1609.04747，2016.
[15] CLARK A，DONAHUE J，SIMONYAN K.Adversarial video generation on complex datasets[J].arXiv：1907. 06571，2019.
[16] MEHDI M，SIMON O.Conditional generative adversarial nets[J]arXiv：1411.1784，2014.
[17] MAO Xudong，LI Qing，XIE Haoran.Multiclass generative adversarial networks with the L2 loss function[J].arXiv：1611.04076，2016.
[18] SHI Hongjiang，WANG Lu，DING Guangtai.Data augmentation with improved generative adversarial networks[C]//International Conference on Pattern Recognition，2018.
[19] YANG Jianchao，YU Kai，GONG Yihong.Linear spatial pyramid matching using sparsecoding for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Miami，FL，USA，2009：1794-1801.
[20] CHRISTIAN S，IVAN L，BARBARA C.Recognizing human actions：a local SVM approach[C]//International Conference on Pattern Recognition，Cambridge，England，2014：32-36.
[21] GORELICK L，BLANK M，SHECHTMAN E.Actions as space-time shapes[J].IEEE Trans Pattern Anal Mach Intell，2007，29（12）：2247-2253.
[22] KUEHNE H，JHUANG H，GARROTE E，HMDB：a large video database for human motion recognition[C]//2011 IEEE International Conference on Computer Vision.Barcelona，Spain：IEEE，2011：2556-2563.
[23] KHURRAM S，AMIR R，MUBARAK S.UCF101：a dataset of 101 human action classes from videos in the wild[C]//IEEE Conference on Computer Vision and Pattern Recognition，2012：752-763.
[24] CHAQUET J M，CARMONA E J，FERNANDEZ-CABALLERO A.A survey of video datasets for human action and activity recognition[J].Computer Vision and Image Understanding，2013，117（6）：633-659.