计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (12): 279-290.DOI: 10.3778/j.issn.1002-8331.2403-0298

• 图形图像处理 • 上一篇    下一篇

融合像素互相关的Transformer跟踪算法

薛紫涵,葛海波,杨雨迪,田攀帅   

  1. 西安邮电大学 电子工程学院,西安 710121
  • 出版日期:2025-06-15 发布日期:2025-06-13

Transformer Tracking Algorithm Integrating Pixel-Wise Cross-Correlation

XUE Zihan, GE Haibo, YANG Yudi, TIAN Panshuai   

  1. School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
  • Online:2025-06-15 Published:2025-06-13

摘要: Siamese网络互相关操作的局部匹配性无法有效获得全局上下文信息,而Transformer网络依赖全局关系获得语义信息,但需要更多的局部边缘信息来区分目标和背景。因此,提出了一种结合像素互相关(pixel-wise cross-correlation,PW-Corr)和Transformer的目标跟踪算法。构建并行编码器并采用非线性重加权注意力(non-linear reweighting attention,NRA)提高Transformer获取全局上下文的能力;设计解码器并融合像素互相关从空间和通道两方面的交互提高特征融合的精确度,过滤多余背景干扰。分类回归任务使用一个基于多层感知器(multi-layer perceptron,MLP)的分类头和具有全局上下文感知模块(global context awareness module,GCAM)的回归头,捕捉全局信息同时提取目标局部信息,促进算法对跟踪目标的准确定位。实验结果表明,改进后的算法在OTB100数据集上成功率和准确率分别可达70.6%、92.1%,提高了跟踪的成功率和准确率。

关键词: Transformer网络, 像素互相关, 注意力机制, 全局上下文感知

Abstract: The local matching of the cross-correlation operation of the Siamese network cannot effectively obtain global context information, while the Transformer network relies on global relationships to obtain semantic information, but requires more local edge information to distinguish the target and the background. Therefore, a target tracking algorithm combining pixel-wise cross-correlation (PW-Corr) and Transformer is proposed. This paper constructs a parallel encoder and uses non-linear reweighting attention (NRA) to improve the ability of Transformer to obtain global context. A decoder and integrate pixel-wise cross-correlation are designed from the interaction between space and channel to improve the accuracy of feature fusion and filters out redundant background interference. The classification and regression task uses a classification head based on multi-layer perceptron (MLP) and a regression head with global context awareness module (GCAM) to capture global information while extracting local information of the target, promoting the algorithm to accurate positioning of tracking targets. Experimental results show that the success rate and accuracy rate of the improved algorithm on the OTB100 dataset are 70.6% and 92.1% respectively, which improves the tracking success rate and accuracy.

Key words: Transformer network, pixel-wise cross-correlation, attention mechanism, global context awareness