计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (6): 208-218.DOI: 10.3778/j.issn.1002-8331.2108-0416

• 图形图像处理 • 上一篇    下一篇

级联特征融合孪生网络目标跟踪算法研究

韩明,王景芹,王敬涛,孟军英   

  1. 1.石家庄学院 计算机科学与工程学院,石家庄 050035
    2.河北工业大学 省部共建电工装备可靠性和智能化国家重点实验室,天津 300130
  • 出版日期:2022-03-15 发布日期:2022-03-15

Research on Object Tracking Algorithm Based on Cascading Feature Fusion of Siamese Network

HAN Ming, WANG Jingqin, WANG Jingtao, MENG Junying   

  1. 1.School of Computer Science and Engineering, Shijiazhuang University, Shijiazhuang 050035, China
    2.State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300130, China
  • Online:2022-03-15 Published:2022-03-15

摘要: 在光照变化、遮挡、背景相似、变形等复杂情况下,目标跟踪过程中难以精确地提取丰富的特征信息,容易导致目标跟踪出现漂移或者跟踪丢失。由于多层神经网络的浅层特征具有高分辨率,适合于目标定位;深层特征具有丰富的语义信息,适合于目标分类。充分利用这一优势,提出了一种级联特征融合的孪生网络目标跟踪算法。对ResNet-50网络进行改进,在减少模型参数和计算量的同时提高跟踪速度;采用级联特征融合策略将ResNet-50最后一阶段的3层特征进行逐级级联融合,进行目标深层语义信息和浅层空间信息的有效提取,实现目标的多特征准确表示。针对目标跟踪过程中大多数算法仅利用第一帧作为目标模板导致跟踪过程中目标模板退化问题,引入模板更新机制,利用相似度阈值法进行模板的实时更新。在OBT2015、VOT2016和VOT2018标准数据集上进行对比实验,实验结果表明,该算法的跟踪精度较高,复杂场景下鲁棒性较强,相对于其他算法有较强的竞争优势。

关键词: 计算机视觉, 目标跟踪, 孪生网络, 特征融合, 模板更新

Abstract: It is difficult to accurately extract rich feature information in the process of target tracking under complex environments such as illumination variation, occlusion, background clutters and deformation, which is easy to lead to the object shift or tracking loss. Because the low-level features have high resolution of multilayer neural network, which is suitable for positioning the object. While the high-level features have rich semantic information and are suitable for object classification. To take full use of the advantage of the multilayer neural network, the siamese network algorithm of cascading feature fusion for object tracking is proposed. The ResNet-50 network is improved, which is reduced the model parameters and computation, and the tracking speed is improved. The cascade feature fusion strategy is adopted to cascade the three layers of features in the last stage of ResNet-50, and to effectively extract the high-level semantic information and low-level spatial information of the object, so as to achieve the accurate multi-feature representation of the object. In the process of object tracking, only the first frame is used as the object template most of the algorithm, which leads to the object template degradation. The template update mechanism is introduced, and the similarity threshold method is used to update the template in real time. The extensive comparative experiments are conducted on the OBT2015, VOT2016 and VOT2018. The experimental results show that the proposed algorithm has higher tracking accuracy and stronger robustness in complex scenes, and has a stronger competitive advantage compared with other algorithms.

Key words: computer vision, object tracking, siamese network, feature fusion, template update