[1] 欧冰, 杨晶晶. 工业自动化系统中的图像处理技术应用[J]. 集成电路应用, 2023, 40(3): 283-285.
OU B, YANG J J. Application of image processing technology in industrial automation systems[J]. Application of IC, 2023, 40(3): 283-285.
[2] 郭士杰, 卢世杰, 耿艳利, 等. 融合VIT与CNN注意力机制的面部疼痛评估算法研究[J]. 计算机工程与应用, 2024, 60(15): 277-283.
GUO S J, LU S J, GENG Y L, et al. Facial pain assessment algorithm fusing VIT and CNN attention mechanism[J]. Computer Engineering and Applications, 2024, 60(15): 277-283.
[3] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[C]//Proceedings of the International Conference on Learning Representations, 2021.
[4] TORFI A, SHIRVANI R A, KENESHLOO Y, et al. Natural language processing advancements by deep learning: a survey[J]. arXiv:2003.01200, 2020.
[5] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]//Proceedings of the International Conference on Machine Learning, 2020.
[6] YUN S, HAN D, CHUN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6022-6031.
[7] ZHANG H, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[J]. arXiv:1710.09412, 2017.
[8] HINTON E G ,VINYALS O ,DEAN J .Distilling the knowledge in a neural network[J]. arXiv:1503.02531, 2015.
[9] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
[10] WU K, ZHANG J N, PENG H W, et al. TinyViT: fast pretraining distillation for small vision transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2022: 68-85.
[11] YUAN L, CHEN Y P, WANG T, et al. Tokens-to-token ViT: training vision transformers from scratch on ImageNet[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 538-547.
[12] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9992-10002.
[13] 孙露露, 刘建平, 王健, 等. 细粒度图像分类上Vision Transformer的发展综述[J]. 计算机工程与应用, 2024, 60(10): 30-46.
SUN L L, LIU J P, WANG J, et al. Survey of Vision Transformer in fine-grained image classification[J]. Computer Engineering and Applications, 2024, 60(10): 30-46.
[14] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[15] PENG Z L, HUANG W, GU S Z, et al. Conformer: local features coupling global representations for visual recognition[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 357-366.
[16] CHEN Y P, DAI X Y, CHEN D D, et al. Mobile-former: bridging MobileNet and transformer[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 5260-5269.
[17] SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4510-4520.
[18] DAI Z, LIU H, LE Q V, et al. CoAtNet: marrying convolution and attention for all data sizes[C]//Advances in Neural Information Processing Systems, 2021: 3965-3977.
[19] GUO J Y, HAN K, WU H, et al. CMT: convolutional neural networks meet vision transformers[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 12165-12175.
[20] 毛伊敏, 张瑞朋, 高波. 大数据下基于特征图的深度卷积神经网络[J]. 计算机工程与应用, 2022, 58(15): 110-116.
MAO Y M, ZHANG R P, GAO B. Deep convolutional neural network algorithm based on feature map in big data environment[J]. Computer Engineering and Applications, 2022, 58(15): 110-116.
[21] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
[22] TOLSTIKHIN I O, HOULSBY N, KOLESNIKOV A, et al. MLP-mixer: an all-MLP architecture for vision[C]//Proceedings of the Neural Information Processing Systems, 2021.
[23] 王志扬, 袁旭, 沈项军, 等. 深度网络去相关层归一化技术研究[J]. 小型微型计算机系统, 2022, 43(5): 1075-1080.
WANG Z Y, YUAN X, SHEN X J, et al. Research on decorrelate layer normalization in deep network[J]. Journal of Chinese Computer Systems, 2022, 43(5): 1075-1080.
[24] 朱威, 屈景怡, 吴仁彪. 结合批归一化的直通卷积神经网络图像分类算法[J]. 计算机辅助设计与图形学学报, 2017, 29(9): 1650-1657.
ZHU W, QU J Y, WU R B. Straight convolutional neural networks algorithm based on batch normalization for image classification[J]. Journal of Computer-Aided Design & Computer Graphics, 2017, 29(9): 1650-1657.
[25] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1800-1807. |