[1] HAMAD A R, WOO W L, WEI B, et al. Overview ofHuman activity recognition using sensor data[C]//Advances in Computational Intelligence Systems. Cham: Springer Nature Switzerland, 2024: 380-391.
[2] MENG Z Z, ZHANG M X, GUO C X, et al. Recent progress in sensing and computing techniques for human activity recognition and motion analysis[J]. Electronics, 2020, 9(9): 1357.
[3] CHENG G, WAN Y, SAUDAGAR A N, et al. Advances in human action recognition: a survey[J]. arXiv:1501.05964, 2015.
[4] JING L L, PARAG T, WU Z, et al. VideoSSL: semi-supervised learning for video classification[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 1109-1118.
[5] SHI B F, DAI Q, HOFFMAN J, et al. Temporal action detection with multi-level supervision[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 8002-8012.
[6] SHI B F, DAI Q, MU Y D, et al. Weakly-supervised action localization by generative attention modeling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1006-1016.
[7] SINGH A, CHAKRABORTY O, VARSHNEY A, et al. Semi-supervised action recognition with temporal contrastive learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10384-10394.
[8] BEAUCHEMIN S S, BARRON J L. The computation of optical flow[J]. ACM Computing Surveys, 1995, 27(3): 433-466.
[9] XIAO J F, JING L L, ZHANG L, et al. Learning from temporal gradient for semi-supervised action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 3242-3252.
[10] GAO G Y, LIU Z M, ZHANG G J, et al. DANet: semi-supervised differentiated auxiliaries guided network for video action recognition[J]. Neural Networks, 2023, 158: 121-131.
[11] XU Y H, WEI F Y, SUN X, et al. Cross-model pseudo-labeling for semi-supervised action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 2949-2958.
[12] DASS S D S, BARUA H B, KRISHNASAMY G, et al. ActNetFormer: Transformer-ResNet hybrid method for semi-supervised action recognition in videos[J]. arXiv:2404.06243, 2024.
[13] LIU Z, NING J, CAO Y, et al. Video swin transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 3192-3201.
[14] BERTASIUS G, WANG H, TORRESANI L. Is space-time attention all you need for video understanding?[J]. arXiv:2102.05095, 2021.
[15] ARNAB A, DEHGHANI M, HEIGOLD G, et al. ViViT: a video vision Transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 6816-6826.
[16] FEICHTENHOFER C. X3D: expanding architectures for efficient video recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 200-210.
[17] FEICHTENHOFER C, FAN H Q, MALIK J, et al. SlowFast networks for video recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6201-6210.
[18] HARA K, KATAOKA H, SATOH Y. Learning spatio-temporal features with 3D residual networks for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE, 2017: 3154-3160.
[19] XU Y, ZHANG Q, ZHANG J, et al. Vitae: vision Transformer advanced by exploring intrinsic inductive bias[C]//Advances in Neural Information Processing Systems, 2021: 28522-28535.
[20] WENG Z, YANG X, LI A, et al. Semi-supervised vision transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 605-620.
[21] SOHN K, BERTHELOT D, LI C L, et al. FixMatch: simplifying semi-supervised learning with consistency and confidence[J]. arXiv:2001.07685, 2020.
[22] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[23] XING Z, DAI Q, HU H, et al. SVFormer: semi-supervised video transformer for action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2023: 18816-18826.
[24] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 248-255.
[25] TARVAINEN A, VALPOLA H. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results[C]//Proceedings of the 31st Conference on Neural Information Processing Systems, 2017.
[26] KIM T, OH J, KIM N Y, et al. Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation[C]//Proceedings of the 13th International Joint Conference on Artificial Intelligence, 2021: 2628-2635.
[27] HE C Y, ANNAVARAM M, AVESTIMEHR S. Group knowledge transfer: federated learning of large CNNs at the edge[J]. arXiv:2007,14513, 2020.
[28] SUN S, REN W, LI J, et al. Logit standardization in knowledge distillation[J]. arXiv:2403.01427, 2024.
[29] KIM Y, YIM J, YUN J, et al. NLNL: negative learning for noisy labels[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 101-110.
[30] CHEN Y H, TAN X, ZHAO B R, et al. Boosting semi-supervised learning by exploiting all unlabeled data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7548-7557.
[31] CUBUK E D, ZOPH B, MANé D, et al. AutoAugment: learning augmentation strategies from data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 113-123.
[32] BERTHELOT D, CARLINI N, GOODFELLOW I, et al. MixMatch: a holistic approach to semi-supervised learning[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019: 5049-5059.
[33] VERMA V, KAWAGUCHI K, LAMB A, et al. Interpolation consistency training for semi-supervised learning[J]. Neural Networks, 2022, 145: 90-106.
[34] FRENCH G, OLIVER A, SALIMANS T. Milking cowmask for semi-supervised image classification[J]. arXiv:2003.12022, 2020.
[35] XIONG B, FAN H Q, GRAUMAN K, et al. Multiview pseudo-labeling for semi-supervised learning from video[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 7189-7199.
[36] ZOU Y L, CHOI J, WANG Q T, et al. Learning representational invariances for data-efficient action recognition[J]. Computer Vision and Image Understanding, 2023, 227: 103597.
[37] ASSEFA M, JIANG W, ALEMU K G, et al. Actor-aware self-supervised learning for semi-supervised video representation learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(11): 6679-6692.
[38] DAVE I R, RIZVE M N, CHEN C, et al. TimeBalance: temporally-invariant and temporally-distinctive video representations for semi-supervised action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 2341-2352.
[39] LI Y H, MAO H Z, GIRSHICK R, et al. Exploring plain vision transformer backbones forObject detection[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 280-296.
[40] XIA C, WANG X, LV F, et al. Vit-comer: vision Transformer with convolutional multi-scale feature interaction for dense predictions[J]. arXiv:2403.07392, 2024.
[41] 张彩灯, 徐杨, 莫寒, 等. 融合多注意力机制的语义调整风格迁移网络[J]. 计算机工程与应用, 2025, 61(8):204-214.
ZHANG C D, XU Y, MO H, et al. Semantic adjustment style transfer network with multi-attention mechanism[J]. Computer Engineering and Applications, 2025, 61(8): 204-214.
[42] ZHANG Y Y, LI X Y, LIU C H, et al. VidTr: video Transformer without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 13557-13567.
[43] NEIMARK D, BAR O, ZOHAR M, et al. Video Transformer network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2021: 3156-3165.
[44] FAN H Q, XIONG B, MANGALAM K, et al. Multiscale vision Transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 6804-6815.
[45] LI K, WANG Y, PENG G, et al. UniFormer: unified transformer for efficient spatial-temporal representation learning[C]//Proceedings of the International Conference on Learning Representations, 2021.
[46] WANG R, WU Z, CHEN D, et al. Video mobile-former: video recognition with efficient global spatial?temporal modeling[J]. arXiv:2208.12257, 2022.
[47] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv:1503.02531, 2015.
[48] GOU J P, YU B S, MAYBANK S J, et al. Knowledge distillation: a survey[J]. International Journal of Computer Vision, 2021, 129(6): 1789-1819.
[49] CHANDRASEGARAN K, TRAN N T, ZHAO Y, et al. Revisiting label smoothing and knowledge distillation compatibility: what was missing?[C]//Proceedings of the International Conference on Machine Learning, 2022: 2890-2916.
[50] LIU J, LIU B, LI H, et al. Meta knowledge distillation[J]. arXiv:2202.07940, 2022.
[51] GUO J, CHEN M H, HU Y, et al. Reducing the teacher-student gap via spherical knowledge disitllation[J]. arXiv:2010.07485, 2020.
[52] RIZVE M N, DUARTE K, RAWAT Y S, et al. In defense of pseudo-labeling: an uncertainty-aware pseudo-label selection framework for semi-supervised learning[J]. arXiv:2101.
06329, 2021.
[53] CHEN J, SHAH V, KYRILLIDIS A. Negative sampling in semi?supervised learning[C]//Proceedings of the International Conference on Machine Learning, 2020: 1704-1714.
[54] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[55] GRILL J B, STRUB F, ALTCHé F, ET AL. Bootstrap your own latent-a new approach to self-supervised learning[C]// Advances in Neural Information Processing Systems, 2020: 21271-21284.
[56] ZHANG H, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[J]. arXiv:1710.09412, 2017.
[57] YUN S, HAN D, CHUN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6022-6031.
[58] SOOMRO K, ZAMIR A R, SHAH M. UCF101: a dataset of 101 human actions classes from videos in the wild[J]. arXiv:1212.0402, 2012.
[59] KUEHNE H, JHUANG H, STIEFELHAGEN R, et al. HMDB51: a large video database for human motion recognition[C]//Proceedings of the International Conference on Computer Vision. Piscataway: IEEE, 2013: 571-582.
[60] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[J]. arXiv:2012.12877, 2020.
[61] GOWDA S N, ROHRBACH M, KELLER F, et al. Learn2Augment: learning to composite videos for data augmentation in action recognition[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 242-259.
[61] TONG A Y, TANG C, WANG W J. Semi-supervised action recognition from temporal augmentation using curriculum learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(3): 1305-1319.
[63] IQBAL O, CHAKRABORTY O, HUSSAIN A, et al. SITAR: Semi-supervised image Transformer for action recognition[J]. arXiv:2409.02910, 2024. |