计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (22): 174-181.DOI: 10.3778/j.issn.1002-8331.2208-0425

• 图形图像处理 • 上一篇    下一篇

引入稀疏自注意力的目标跟踪算法

王金栋,张惊雷,文彪   

  1. 1.天津理工大学 电气工程与自动化学院,天津 300384
    2.天津理工大学 天津市复杂系统控制理论及应用重点实验室,天津 300384
  • 出版日期:2023-11-15 发布日期:2023-11-15

Object Tracking Algorithm with Sparse Self-Attention

WANG Jindong, ZHANG Jinglei, WEN Biao   

  1. 1.School of Electrical Engineering and Automation, Tianjin University of Technology, Tianjin 300384, China
    2.Tianjin Key Laboratory for Control Theory and Applications in Complicated Systems, Tianjin University of Technology, Tianjin 300384, China
  • Online:2023-11-15 Published:2023-11-15

摘要: 针对基于Transformer架构的目标跟踪算法在特征增强过程中应用多头自注意力产生的计算复杂度高的问题,提出一种稀疏自注意力方法以实现线性计算复杂度的目标跟踪算法(E-TransT)。在特征提取网络中加入金字塔切分注意力模块并且调整网络输出结构,使提取的特征具有不同尺度的上下文信息。设计了一个通过稀疏自注意力方法实现改进的自注意增强模块,有效减少了在注意力计算过程中的参数量,在降低计算复杂度的同时保持了捕捉像素级细节的能力。采用LaSOT、TrackingNet等5种测试集进行算法性能评测实验,结果表明所提算法的跟踪成功率、精度等主要评价指标较TransT、SiamR-CNN等11种经典算法均获得提升。

关键词: 目标跟踪, 孪生网络, 稀疏自注意力, 多尺度上下文信息

Abstract: Aiming at the problem of high computational complexity caused by the application of multi-head self-attention in the feature enhancement process of target tracking algorithm based on Transformer architecture, a sparse self-attention method is proposed to achieve target tracking algorithm(E-TransT) with linear computational complexity. Firstly, a pyramid segmentation attention module is added to the feature extraction network, and the network output structure is adjusted, so that the extracted features can have multi-scale contextual information. Secondly, an improved self-attention enhancement module is designed through the sparse self-attention method, which effectively reduces the amount of parameters in the self-attention calculation process, and reduces the computational complexity while maintaining the ability to capture pixel-level details. Five test sets including LaSOT and TrackingNet are used to evaluate the performance of the algorithm. The results show that on the main performances, such as tracking success rate and precision, the proposed method is better than eleven traditional algorithms such as TransT, SiamR-CNN, etc.

Key words: object tracking, Siamese network, sparse self-attention, multi-scale contextual information