[1] 陈龙, 张建林, 彭昊, 等. 多尺度注意力与领域自适应的小样本图像识别[J]. 光电工程, 2023, 50(4): 66-80.
CHEN L, ZHANG J L, PENG H, et al. Few-shot image classification via multi-scale attention and domain adaptation[J]. Opto-Electronic Engineering, 2023, 50(4): 66-80.
[2] 赵春梅, 陈忠碧, 张建林. 基于深度学习的飞机目标跟踪应用研究[J]. 光电工程, 2019, 46(9): 1-10.
ZHAO C M, CHEN Z B, ZHANG J L. Application of aircraft target tracking based on deep learning[J]. Opto-Electronic Engineering, 2019, 46(9): 1-10.
[3] 石超, 陈恩庆, 齐林. 红外视频中的舰船检测[J]. 光电工程, 2018, 45(6): 86-91.
SHI C, CHEN E Q, QI L. Ship detection from infrared video[J]. Opto-Electronic Engineering, 2018, 45(6): 86-91.
[4] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[5] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 10347-10357.
[6] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[7] LIU Z, HU H, LIN Y, et al. Swin transformer v2: scaling up capacity and resolution[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12009-12019.
[8] WANG W, XIE E, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 568-578.
[9] WU H, XIAO B, CODELLA N, et al. CVT: introducing convolutions to vision transformers[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 22-31.
[10] PAN Z, ZHUANG B, LIU J, et al. Scalable vision transformers with hierarchical pooling[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 377-386.
[11] GUO J, HAN K, WU H, et al. CMT: convolutional neural networks meet vision transformers[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12175-12185.
[12] YUAN K, GUO S, LIU Z, et al. Incorporating convolution designs into visual transformers[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 579-588.
[13] CHEN Y, DAI X, CHEN D, et al. Mobile-former: bridging MobileNet and transformer[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5270-5279.
[14] MEHTA S, RASTEGARI M. MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer[J]. arXiv:2110.02178, 2021.
[15] ZHANG W, HUANG Z, LUO G, et al. TopFormer: token pyramid transformer for mobile semantic segmentation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12083-12093.
[16] SANDLER M, HOWARD A, ZHU M, et al. Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation[J]. arXiv:1801.04381, 2018.
[17] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[18] TAN M, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 6105-6114.
[19] FRAN C. Deep learning with depth wise separable convolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1800-1807.
[20] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[21] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning, 2015: 448-456.
[22] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]//Advances in Neural Information Processing Systems 29, 2016. |