[1] 张瑶, 卢焕章, 张路平, 等. 基于深度学习的视觉多目标跟踪算法综述[J]. 计算机工程与应用, 2021, 57(13): 55-66.
ZHANG Y, LU H Z, ZHANG L P, et al. Overview of visual multi-object tracking algorithms with deep learning[J]. Computer Engineering and Applications, 2021, 57 (13): 55-66.
[2] 周海赟, 项学智, 王馨遥, 等. 多特征融合的端到端链式行人多目标跟踪网络[J]. 计算机工程, 2022, 48(9): 305-313.
ZHOU H Y, XIANG X Z, WANG X Y, et al. Chained end to end pedestrian multi-object tracking network with multi-feature fusion[J]. Computer Engineering, 2022, 48(9): 305-313.
[3] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008.
[4] 刘文婷, 卢新明. 基于计算机视觉的Transformer研究进展[J]. 计算机工程与应用, 2022, 58(6): 1-16.
LIU W T, LU X M. Research progress of transformer based on computer vision[J]. Computer Engineering and Applications, 2022, 58(6): 1-16.
[5] PENG Z L, HUANG W, GU S Z, et al. Conformer: local features coupling global representations for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(8): 9454-9468.
[6] 田永林, 王雨桐, 王建功, 等. 视觉Transformer研究的关键问题: 现状及展望[J]. 自动化学报, 2022, 48(4): 957-979.
TIAN Y L, WANG Y T, WANG J G, et al. Key problems and progress of vision transformers: the state of the art and prospects[J]. Acta?Automatica Sinica, 2022, 48(4): 957-979.
[7] WANG L, PHAM T, NG T, et al. Learning deep features for multiple object tracking by using a multi-task learning strategy[C]//2014 IEEE International Conference on Image Processing (ICIP), 2014: 838-842.
[8] KIM C, LI F, CIPTADI A, et al. Multiple hypothesis tracking revisited[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 4696-4704.
[9] CHEN L, AI H, SHANG C, et al. Online multi-object tracking with convolutional neural networks[C]//2017 IEEE International Conference on Image Processing (ICIP), 2017: 645-649.
[10] WOJKE N, BEWLEY A, PAULUS D, et al. Simple online and real-time tracking with a deep association metric[C]//2017 IEEE International Conference on Image Processing (ICIP), 2017: 3645-3649.
[11] BEWLEY A, GE Z, OTT L, et al. Simple online and realtime tracking[C]//2016 IEEE International Conference on Image Processing (ICIP), 2016: 3464-3468.
[12] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[13] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[14] YUAN L, CHEN Y, WANG T, et al. Tokens-to-token vit: training vision transformers from scratch on imagenet[J]. arXiv:2101.11986, 2021.
[15] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[16] SUN P, CAO J, JIANG Y, et al. TransTrack: multiple object tracking with transformer[J]. arXiv:2012.15460, 2020.
[17] MEINHARDT T, KIRILLOV A, LEAL-TAIXE L, et al. TrackFormer: multi-object tracking with Transformers[J]. arXiv:2101.02702,2021.
[18] 江英杰, 宋晓宁. 基于视觉Transformer的双流目标跟踪算法[J]. 计算机工程与应用, 2022, 58(12): 183-190.
JIANG Y J, SONG X N. Double stream target tracking algorithm based on visual transformer[J]. Computer Engineering and Applications, 2022, 58(12): 183-190.
[19] CHEN Y, DAI X, CHEN D, et al. Mobile-Former: bridging mobilenet and transformer[J]. arXiv. 2108. 05895, 2021.
[20] MA N, ZHANG X, ZHENG H, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[C]//2018 European Conference on Computer Vision (ECCV), 2018: 122-138.
[21] SANDLER M, HPWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[22] DENDORFER P, OEP A, MILAN A, et al. MOT Challenge: a benchmark for single-camera multiple target tracking[J]. International Journal of Computer Vision, 2021, 129: 845-881.
[23] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2012: 3354-3361.
[24] WEN L, DU D, CAI Z, et al. UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking[J]. Computer Vision and Image Understanding, 2020, 193: 102907.
[25] XU Y, BAN Y, DELORME G, et al. TransCenter: transformer with dense queries for multiple-object tracking[J]. arXiv:2103.15145, 2021.
[26] CHU P, WANG J, YOU Q, et al. TransMOT: spatial-temporal graph transformer for multiple object tracking[J]. arXiv:2104.00194, 2021. |