Object Tracking Algorithm with Sparse Self-Attention

doi:10.3778/j.issn.1002-8331.2208-0425

Abstract

Abstract: Aiming at the problem of high computational complexity caused by the application of multi-head self-attention in the feature enhancement process of target tracking algorithm based on Transformer architecture, a sparse self-attention method is proposed to achieve target tracking algorithm（E-TransT） with linear computational complexity. Firstly, a pyramid segmentation attention module is added to the feature extraction network, and the network output structure is adjusted, so that the extracted features can have multi-scale contextual information. Secondly, an improved self-attention enhancement module is designed through the sparse self-attention method, which effectively reduces the amount of parameters in the self-attention calculation process, and reduces the computational complexity while maintaining the ability to capture pixel-level details. Five test sets including LaSOT and TrackingNet are used to evaluate the performance of the algorithm. The results show that on the main performances, such as tracking success rate and precision, the proposed method is better than eleven traditional algorithms such as TransT, SiamR-CNN, etc.

Key words: object tracking, Siamese network, sparse self-attention, multi-scale contextual information

摘要： 针对基于Transformer架构的目标跟踪算法在特征增强过程中应用多头自注意力产生的计算复杂度高的问题，提出一种稀疏自注意力方法以实现线性计算复杂度的目标跟踪算法（E-TransT）。在特征提取网络中加入金字塔切分注意力模块并且调整网络输出结构，使提取的特征具有不同尺度的上下文信息。设计了一个通过稀疏自注意力方法实现改进的自注意增强模块，有效减少了在注意力计算过程中的参数量，在降低计算复杂度的同时保持了捕捉像素级细节的能力。采用LaSOT、TrackingNet等5种测试集进行算法性能评测实验，结果表明所提算法的跟踪成功率、精度等主要评价指标较TransT、SiamR-CNN等11种经典算法均获得提升。

关键词: 目标跟踪, 孪生网络, 稀疏自注意力, 多尺度上下文信息

WANG Jindong, ZHANG Jinglei, WEN Biao. Object Tracking Algorithm with Sparse Self-Attention[J]. Computer Engineering and Applications, 2023, 59(22): 174-181.

王金栋, 张惊雷, 文彪. 引入稀疏自注意力的目标跟踪算法[J]. 计算机工程与应用, 2023, 59(22): 174-181.

References

[1] 李玺，查宇飞，张天柱，等.深度学习的目标跟踪算法综述[J].中国图象图形学报，2019，24（12）：2057-2080.
LI X，ZHA Y F，ZHANG T Z，et al.A review of target tracking algorithms in deep learning[J].Chinese Journal of Image and Graphics，2019，24（12）：2057-2080.
[2] 王海涛，王荣耀，王文皞，等.目标跟踪综述[J].计算机测量与控制，2020，28（4）：1-6.
WANG H T，WANG R Y，WANG W H，et al.Review of target tracking[J].Computer Measurement and Control，2020，28（4）：1-6.
[3] 郑远攀，李广阳，李晔.深度学习在图像识别中的应用研究综述[J].计算机工程与应用，2019，55（12）：20-36.
ZHENG Y P，LI G Y，LI Y.Survey of application of deep learning in image recognition[J].Computer Engineering and Applications，2019，55（12）：20-36.
[4] MENG L，YANG X.A survey of object tracking algorithms[J].Acta Automatica Sinica，2019，45（7）：1244-1260.
[5] HE K，ZHANG X，REN S，et al.Deep residual learning for image recgnition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[6] TAO R，GAVVES E，SMEULDERS W M，et al.Siamese instance search for tracking[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：1420-1429.
[7] BERTINETTO L，VALMADRE J，HENRIQUES J F，et al.Fully-convolutional siamese networks for object tracking[C]//Proceedings of the 14th European Conference on Computer Vision.Cham：Springer，2016：850-865.
[8] LI B，YAN J，WU W，et al.High performance visual tracking with siamese region proposal network[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：8971-8980.
[9] LI B，WU W，WANG Q，et al.SiamRPN++：evolution of siamese visual tracking with very deep networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：4282-4291.
[10] WANG Q，ZHANG L，BERTINETTO L，et al.Fast online object tracking and segmentation：a unifying approach[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：1328-1338.
[11] CHEN Z，ZHONG B，LI G，et al.Siamese box adaptive network for visual tracking[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：6668-6677.
[12] XU Y，WANG Z，LI Z，et al.SiamFC++：towards robust and accurate visual tracking with target estimation guidelines[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence，2020：12549-12556.
[13] VOIGTLAENDER P，LUITEN J，TORR P H S，et al.Siam R-CNN：visual tracking by redetection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：6578-6588.
[14] DANELLJAN M，BHAT G，SHAHBAZ K F，et al.ECO：efficient convolution operators for tracking[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：6638-6646.
[15] DANELLJAN M，BHAT G，KHAN F S，et al.Atom：accurate tracking by overlap maximization[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：4660-4669.
[16] BHAT G，DANELLJAN M，GOOL L V，et al.Learning discriminative model prediction for tracking[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision，2019：6182-6191.
[17] CHEN X，YAN B，ZHU J，et al.Transformer tracking[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：8126-8135.
[18] 洪培钦，罗灵鲲，刘冰，等.引入轻量注意力的孪生神经网络目标跟踪算法[J].计算机工程与应用，2022，58（12）：112-121.
HONG P Q，LUO L K，LIU B，et al.Siamese neural network target tracking algorithm with lightweight attention[J].Computer Engineering and Applications，2022，58（12）：112-121.