[1] FAN H, LING H. Siamese cascaded region proposal networks for real-time visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7952-7961.
[2] 刘艺, 李蒙蒙, 郑奇斌, 等. 视频目标跟踪算法综述[J]. 计算机科学与探索, 2022, 16(7): 1504-1515.
LIU Y, LI M M, ZHENG Q B, et al. Survey on video object tracking algorithms[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515.
[3] 蒋凌云, 杨金龙. 检测优化的标签多伯努利视频多目标跟踪算法[J]. 计算机科学与探索, 2023, 17(6): 1343-1358.
JIANG L Y, YANG J L. Detection optimized labeled multi-Bernoulli algorithm for visual multi-target tracking[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(6): 1343-1358.
[4] BOLME D S, BEVERIDGE J R, DRAPER B A, et al. Visual object tracking using adaptive correlation filters[C]//Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA: IEEE, 2010: 2544-2550.
[5] MA H, LIN Z, ACTON S T. FAST: fast and accurate scale estimation for tracking[J]. IEEE Signal Processing Letters, 2019, 27(99): 161-165.
[6] ZHANG L, JAGANNADAN V, PONNUTHURAI N S, et al. Robust visual tracking using oblique random forests[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Honolulu, HI, Jul 21-26, 2017. Piscataway: IEEE, 2017: 5825-5834.
[7] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional siamese networks for object tracking[C]//Proceedings of the 2016 European Conference on Computer Vision. Cham: Springer, 2016: 850-865.
[8] GUO Q, FENG W, ZHOU C, et al. Learning dynamic siamese network for visual object tracking[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 1781-1789.
[9] LI B, YAN J, WU W, et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018: 8971-8980.
[10] ZHANG Z P, PENG H W. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 2019: 4586-4595.
[11] GUO D Y, WANG J, CUI Y, et al. SiamCAR: siamese fully convolutional classification and regression for visual tracking[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, Jun 13-19, 2020. Piscataway: IEEE, 2020: 6268-6276.
[12] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[13] SUN C, SHRIVASTAVA A, SINGH S, et al. Revisiting unreasonable effectiveness of data in deep learning era[C]//Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017: 843-852.
[14] DENG J, DONG W, SOCHERR R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, 2009: 248-255.
[15] TAN M, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on MachineLearning, Long Beach, CA, USA, 2019: 6105-6114.
[16] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]//Proceedings of the International Conference on Machine Learning, 2021: 10347-10357.
[17] WANG W, XIE E, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 2021: 568-578.
[18] WU H, XIAO B, CODELLA N, et al. CvT: introducing convolutions to vision transformers[C]//Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 2021: 22-31.
[19] YUAN L, CHEN Y, WANG T, et al. Tokens-to-token ViT: training vision transformers from scratch on ImageNet[C]//Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 2021: 558-567.
[20] RAO Y, ZHAO W, LIU B, et al. DynamicViT: efficient vision transformers with dynamic token sparsification[C]//Advances in Neural Information Processing Systems, 2021, 34: 13937-13949.
[21] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 2021: 10012-10022.
[22] HEO B, YUN S, HAN D, et al. Rethinking spatial dimensions of vision transformers[C]//Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 2021: 11936-11945.
[23] 潘昊, 刘翔, 赵静文, 等. 联合Transformer与BYTE数据关联的多目标实时跟踪算法[J]. 激光与光电子学进展, 2023, 60(6): 154-161.
PAN H, LIU X, ZHAO J W, et al. Multitarget real-time tracking algorithm based on Transformer and BYTE data[J]. Laser & Optoelectronics Progress, 2023, 60(6): 154-161.
[24] HU H, ZHANG Z, XIE Z, et al. Local relation networks for image recognition[C]//Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 2019: 3464-3473.
[25] WANG H, ZHU Y, GREEN B, et al. Axial-deeplab: stand-alone axial-attention for panoptic segmentation[C]//Proceedings of the 2020 European Conference on Computer Vision, Glasgow, UK, 2020: 108-126.
[26] HUANG L, YUAN Y, GUO J, et al. Interlaced sparse self-attention for semantic segmentation[J]. arXiv:1907.12273, 2019.
[27] VASWANI A, RAMACHANDRAN P, SRINIVAS A, et al. Scaling local self-attention for parameter efficient visual backbones[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021: 12894-12904.
[28] CHU X, TIAN Z, WANG Y, et al. Twins: revisiting the design of spatial attention in vision transformers[C]//Advances in Neural Information Processing Systems, 2021, 34: 9355-9366.
[29] LI B, WU W, WANG Q, et al. SiamRPN++: evolution of siamese visual tracking with very deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 4282-4291.
[30] BA J L, KIROS J R, HINTON G E. Layer normalization[J]. arXiv:1607.06450, 2016.
[31] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 658-666.
[32] HUANG L, ZHAO X, HUANG K. Got-10k: a large high diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(5): 1562-1577.
[33] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 2014 European Conference on Computer Vision, Zurich, Switzerland, 2014: 740-755.
[34] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115: 211-252.
[35] WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and MAchine Intelligence, 2015, 37(9): 1834-1848.
[36] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 2010: 249-256.
[37] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[J]. arXiv:1711.05101, 2017. |