Research on Object Tracking Algorithm Based on Cascading Feature Fusion of Siamese Network

doi:10.3778/j.issn.1002-8331.2108-0416

Abstract

Abstract: It is difficult to accurately extract rich feature information in the process of target tracking under complex environments such as illumination variation, occlusion, background clutters and deformation, which is easy to lead to the object shift or tracking loss. Because the low-level features have high resolution of multilayer neural network, which is suitable for positioning the object. While the high-level features have rich semantic information and are suitable for object classification. To take full use of the advantage of the multilayer neural network, the siamese network algorithm of cascading feature fusion for object tracking is proposed. The ResNet-50 network is improved, which is reduced the model parameters and computation, and the tracking speed is improved. The cascade feature fusion strategy is adopted to cascade the three layers of features in the last stage of ResNet-50, and to effectively extract the high-level semantic information and low-level spatial information of the object, so as to achieve the accurate multi-feature representation of the object. In the process of object tracking, only the first frame is used as the object template most of the algorithm, which leads to the object template degradation. The template update mechanism is introduced, and the similarity threshold method is used to update the template in real time. The extensive comparative experiments are conducted on the OBT2015, VOT2016 and VOT2018. The experimental results show that the proposed algorithm has higher tracking accuracy and stronger robustness in complex scenes, and has a stronger competitive advantage compared with other algorithms.

Key words: computer vision, object tracking, siamese network, feature fusion, template update

摘要： 在光照变化、遮挡、背景相似、变形等复杂情况下，目标跟踪过程中难以精确地提取丰富的特征信息，容易导致目标跟踪出现漂移或者跟踪丢失。由于多层神经网络的浅层特征具有高分辨率，适合于目标定位；深层特征具有丰富的语义信息，适合于目标分类。充分利用这一优势，提出了一种级联特征融合的孪生网络目标跟踪算法。对ResNet-50网络进行改进，在减少模型参数和计算量的同时提高跟踪速度；采用级联特征融合策略将ResNet-50最后一阶段的3层特征进行逐级级联融合，进行目标深层语义信息和浅层空间信息的有效提取，实现目标的多特征准确表示。针对目标跟踪过程中大多数算法仅利用第一帧作为目标模板导致跟踪过程中目标模板退化问题，引入模板更新机制，利用相似度阈值法进行模板的实时更新。在OBT2015、VOT2016和VOT2018标准数据集上进行对比实验，实验结果表明，该算法的跟踪精度较高，复杂场景下鲁棒性较强，相对于其他算法有较强的竞争优势。

关键词: 计算机视觉, 目标跟踪, 孪生网络, 特征融合, 模板更新

HAN Ming, WANG Jingqin, WANG Jingtao, MENG Junying. Research on Object Tracking Algorithm Based on Cascading Feature Fusion of Siamese Network[J]. Computer Engineering and Applications, 2022, 58(6): 208-218.

韩明, 王景芹, 王敬涛, 孟军英. 级联特征融合孪生网络目标跟踪算法研究[J]. 计算机工程与应用, 2022, 58(6): 208-218.

References

[1] XIAO T，LI H S，OUYANG W L，et al.Learning deep feature representations with domain guided dropout for person re-identification[C]//IEEE Conference on Computer Vision and Pattern Recognition，2016：1249-1258.
[2] LIU Q N，CHU Q，LIU B，et al.GSM：graph similarity model for multi-object tracking[C]//Twenty-Ninth International Joint Conference on Artificial Intelligence，2020：530-536.
[3] LIU Y C，WANG P，WANG H T.Target tracking algorithm based on deep learning and multi-video monitoring[C]//Proceedings of the 5th International Conference on Systems and Informatics.Los Alamitos：IEEE Computer Society Press，2018：440-444.
[4] KUAI Y L，WEN G J，LI D D.Masked and dynamic siamese network for roubst visual tracking[J].Information Sciences，2019，503：169-182.
[5] LI X，MA C，WU B Y，et al.Target-aware deep tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition，2019：1369-1378.
[6] LI P X，CHEN B Y，OUYANG W L，et al.Gradnet：gradient-guided network for visual object tracking[C]//IEEE International Conference on Computer Vision，2019：6162-6171.
[7] HOU Z Q，CHEN L L，YU W S，et al.Roubst visual tacking algorithm based on siamese network with dual templates[J].Journal of Electronics and Information Technology，2019，41（9）：2247-2255.
[8] BERTINETTO L，VALMADRE J，HENRIQUES J，et al.Fully-convolutional siamese networks for object tracking[C]//European Conference on Computer Vision，2016：850-856.
[9] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.Imagenet classification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems，2012：1097-1105.
[10] LI Y H，ZHANG X F.SiamVGG：Visual tracking using deeper siamesenetworks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：1-17.
[11] ZHANG Z P，PENG H W.Deeper and wider siamese networks for real-time visual tracking[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：4586-4595.
[12] 谭建豪，郑英帅，王耀南.基于中心点搜索的无锚框全卷积孪生跟踪器[J].自动化学报，2021，47（4）：801-812.
TAN J H，ZHENG Y S，WANG Y N，et al.AFST：anchor-free fully convolutional siamese tracker with searching center point[J].Acta Automatica Sinica，2021，47（4）：801-812.
[13] LI B，YAN J J，WU W，et al.High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：8971-8980.
[14] LI B，WU W，WANG Q，et al.SiamRPN++：evolution of siamese visual tracking with very deep networks[C]//IEEE Conference on Computer Vision and Pattern Recognition，2019：4282-4291.
[15] GUO Q，FENG W，ZHOU C，et al.Learning dynamic siamese network for visual object tracking[C]//IEEE International Conference on Computer Vision，2017：1763-1771.
[16] ZHU Z，WANG Q，LI B，et al.Distractor-aware siamese networks for visual object tracking[C]//European Conference on Computer Vision，2018：101-117.
[17] WANG Q，ZHANG L，BERTINETTO L，et al.Fast online object tracking and segmentation：a unifying approach[C]//IEEE Conference on Computer Vision and Pattern Recognition，2019：1328-1338.
[18] GUO D Y，WANG J，CUI Y，et al.SiamCAR：siamese fully convolutional classification and regression for visual tracking[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：6268-6276.
[19] WANG Q，TENG Z，XING J L，et al.Learning attentions：residual attentionalsiamesenetwork for high performance online visual tracking[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：4854-4863.
[20] YU Y C，XIONG Y L，HUANG W L，et al.Deformable siamese attention networks for visual object tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition，2020：6728-6737.
[21] 成磊，王玥，田春娜.一种添加残差注意力机制的视觉目标跟踪算法[J].西安电子科技大学学报，2020，47（6）：148-157.
CHENG L，WANG Y，TIAN C N.Residual attention mechansim for visual tracking[J].Journal of Xidian University，2020，47（6）：148-157.
[22] 王玲，王家沛，王鹏，等.融合注意力机制的孪生网络目标跟踪算法研究[J].计算机工程与应用，2021，57（8）：169-174.
WANG L，WANG J P，WANG P，et al.Siamese network tracking algorithms for hierarchical fusion of attention mechanism[J].Computer Engineering and Applications，2021，57（8）：169-174.
[23] 程旭，崔一平，宋晨，等.基于时空注意力机制的目标跟踪算法[J].计算机科学，2021，48（4）：123-129.
CHENG X，CUI Y P，SONG C，et al.Object tracking algorithm based on temporal-spatial attention mechanism[J].Computer Science，2021，48（4）：123-129.
[24] YUAN T T，YANG W Z，LI Q，et al.An anchor-free siamese network with multi-template update for object tracking[J].Electronics，2021，10：1067-1076.
[25] CHEN K，TAO W B.Once for all：a two-flow convolutional neural network for visual tracking[J].IEEE Transactions on Circuits and System for Video Technology，2017，28（12）：3377-3386.
[26] YUAN Y，LU Y W，WANG Q.Tracking as a whole：multi-target tracking by modeling group behavior with sequential detection[J].IEEE Transacfions on Intelligent Transportation Systems，2017，18（120）：3339-3349.
[27] ZHAI S L，SHAO P P，LIANG X Y，et al.Fast RGB-T tracking via cross-modal correlation filters[J].Neurocomputing，2019，334：172-181.
[28] MARTIN D，GUSTAV H，FAHAD K，et al.Learning spatially regularized correlation filters for visual tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Piscataway，2015：1-10.
[29] LAN X，YE M，SHAO R，et al.Learning modality consistency feature templates：a robust RGB-infrared tracking system[J].IEEE Transactions on Industrial Electronics，2019，66（12）：9887-9897.
[30] YUAN Y，XIONG Z T，WANG Q.An incremental framework for video-based traffific sign detection，tracking，and recognition[J].IEEE Transactions on Intelligent Transportation Systems，2016，18（7）：1918-1929.
[31] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[32] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2016：2117-2125.
[33] LI Z X，ZHOU F Q.FSSD：feature fusion single shot multibox detector[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：1-10.
[34] HE K M，ZHANG X Y，REN S Q，et al.Multicue correlation fifilters for robust visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：4844-4853.
[35] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2014：1-14.
[36] SZEGEDY C，LIU W，JIA Y Q，et al.Going deeper with convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition，2015：1-9.
[37] RUSSAKOVSKY O，DENG J，SU H，et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision，2015，115（3）：211-252.
[38] FAN H，LIN L T，YANG F，et al.LaSOT：a high-quality benchmark for large-scale single object tracking[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：1-18.
[39] WU Y，LIM J，YANG M H.Object tracking benchmark[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2015，37（9）：1834-1848.
[40] KRISTAN M，LEONARDIS A，MATAS J，et al.The visual object tracking vot2016 challenge results[C]//14th European Conference on Computer Vision，2016.
[41] KRISTAN M，LEONARDIS A，MATAS J，et al.The sixth visual object tracking VOT2018 challenge results[C]//European Conference on Computer Vision，2018：1-52.
[42] DANELLJAN M，HAGER G，KHAN F S，et al.Discriminative scale space tracking[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2016，39（8）：1561-1575.
[43] HE A F，LUO C，TIAN X M，et al.A two fold siamese network for real-time object tracking[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Salt Lake City，USA，2018：4834-4843.
[44] DANELLAN M，BHAtt G，SHAHBAZ KHAN F，et al.ECO：efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos：IEEE Computer Society Press，2017：6931-6939.
[45] DANELLJAN M，ROBINSON A，KHAN F S，et al.Beyond correlation filters：learning continuous convolution operators for visual tracking[C]//Proceedings of the European Conference on Computer Vision，Amsterdam，2016：472-488.