计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (23): 161-168.DOI: 10.3778/j.issn.1002-8331.2105-0140

• 模式识别与人工智能 • 上一篇    下一篇

基于注意力机制和孪生网络的跟踪算法研究

王玲,周磊,王鹏,白燕娥   

  1. 长春理工大学 计算机科学技术学院,长春 130022
  • 出版日期:2022-12-01 发布日期:2022-12-01

Research on Tracking Algorithm Based on Attention Mechanism and Siamese Network

WANG Ling, ZHOU Lei, WANG Peng, BAI Yan’e   

  1. College of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China
  • Online:2022-12-01 Published:2022-12-01

摘要: 提出融合卷积通道注意力机制、堆叠通道注意力机制和空间注意力机制的孪生网络跟踪器(ThrAtt-Siam)来提升跟踪性能。ThrAtt-Siam跟踪器以SiameseFC为基础,通过在低卷积层融合卷积通道注意力机制、两个特征图与两个卷积块,加强目标物体特征提取,提高跟踪器对背景特征抗干扰能力和辨别能力;在目标图像分支融合堆叠通道注意力机制与空间注意力机制,其中堆叠通道注意力机制可有效区分有用特征与无用特征,同时针对不同通道的有用特征进行提取,空间注意力机制可有效地补充目标物体特征在通道空间中的信息,能够更好地对目标进行定位。在OTB2015和VOT2017数据集上的实验结果表明,ThrAtt-Siam跟踪器对目标物体形变、低分辨率和遮挡问题都取得了较好的跟踪准确率和成功率。

关键词: 特征融合, 孪生网络, 注意力机制, 目标跟踪

Abstract: This paper proposes a twin network tracker which combines convolutional channel attention mechanism, stacked channel attention mechanism and spatial attention mechanism to improve tracking performance(ThrAtt-Siam). ThrAtt-Siam tracker is based on SiameseFC. By fusing convolution channel attention mechanism, two feature maps and two convolution blocks in the low convolution layer, the target feature extraction is enhanced, and the anti-interference ability and discrimination ability of the tracker against background features are improved. The stackable channel attention mechanism and spatial attention mechanism are fused in the target image branch. The stackable channel attention mechanism can effectively distinguish useful features from useless features. At the same time, the useful features of different channels are extracted. The spatial attention mechanism can effectively supplement the information of the target object features in the channel space, and can better locate the target. The experimental results on OTB2015 and VOT2017 data sets show that ThrAtt-Siam tracker achieves good tracking accuracy and success rate for target deformation, low resolution and occlusion problems.

Key words: feature fusion, twin network, attention mechanism, target tracking