计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (12): 166-176.DOI: 10.3778/j.issn.1002-8331.2403-0273

• 模式识别与人工智能 • 上一篇    下一篇

融合注意力的特征聚合孪生网络视觉跟踪

金静,牛品,翟凤文   

  1. 兰州交通大学 电子与信息工程学院,兰州 730070
  • 出版日期:2025-06-15 发布日期:2025-06-13

Visual Tracking with Feature Aggregation Siamese Network Fused Attention

JIN Jing, NIU Pin, ZHAI Fengwen   

  1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
  • Online:2025-06-15 Published:2025-06-13

摘要: 目前以孪生网络为基础的目标跟踪算法,仍然存在网络浅层的特征中有价值的上下文信息无法合理利用的问题。针对这一问题,提出一种融合拆分注意力机制(split-attention,SA)的目标跟踪算法SiamMCFA(siamese multi-channel feature aggregation module)。在骨干网络中引入拆分注意力机制,用来提取浅层特征中有价值的上下文信息,通过像素级互相关模块(pixel-wise cross correlation,PWCC)融合模板区域和搜索区域浅层和深层特征中的上下文信息,以增强模板区域和搜索区域的特征图之间的联系,从而提高跟踪器的鲁棒性。针对因尺度变化而容易导致目标丢失的问题,设计了一个多通道特征聚合模块(multi-channel feature aggregation module,MCFA),用于聚合目标不同区域的特征信息,使跟踪器尽可能地区分目标和语义背景,进一步提升跟踪准确性。最后,在OTB100、VOT2019、GOT10K和LaSOT四个数据集上进行了详尽的实验评估,结果显示,SiamMCFA与当前基于孪生网络的先进的跟踪器SiamCAR相比,其成功率(success rate)与精准度(precision)分别提高了2.26和2.83个百分点。与SiamIRCA相比成功率与精准度提高了0.3和0.9个百分点。

关键词: 目标跟踪, 孪生网络, 拆分注意力, 像素级互相关, 多通道特征聚合

Abstract: At present, the target tracking algorithm based on siamese network still has the problem that the valuable context information in the shallow features of the network cannot be reasonably used. To solve this problem, a target tracking algorithm SiamMCFA (siamese multi-channel feature aggregation module) combining split-attention mechanism (SA) is proposed. Firstly, a split attention mechanism is introduced into the backbone network to extract valuable context information from the shallow features. Then, the context information in the shallow and deep features of the template region and the search region is fused by the pixel-wise cross correlation (PWCC) module. In order to enhance the connection between the feature maps of the template region and the search region, so as to improve the robustness of the tracker. Secondly, aiming at the problem of target loss caused by scale change, a multi-channel feature aggregation module (MCFA) is designed to aggregate the feature information of different regions of the target, so that the tracker can distinguish the target from the semantic background as much as possible and further improve the tracking accuracy. Finally, a detailed experimental evaluation is conducted on four datasets: OTB100, VOT2019, GOT10K, and LaSOT. The results show that compared with the current advanced tracker SiamCAR based on the siamese network, the success rate and precision of SiamMCFA have increased by 2.26 and 2.83 percentage points respectively. Compared with SiamIRCA, the success rate and accuracy have increased by 0.3 and 0.9 percentage points respectively.

Key words: object tracking, siamese network, split attention, pixel-wise cross correlation, multi-channel feature aggregation