计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (24): 135-143.DOI: 10.3778/j.issn.1002-8331.2105-0377

• 模式识别与人工智能 • 上一篇    下一篇

基于双分支孪生网络的目标跟踪

邱守猛,谷宇章,袁泽强   

  1. 1.中国科学院 上海微系统与信息技术研究所 仿生视觉系统实验室,上海 200050
    2.中国科学院大学,北京 100049
  • 出版日期:2021-12-15 发布日期:2021-12-13

Double Adjust Head Siamese Network for Object Tracking

QIU Shoumeng, GU Yuzhang, YUAN Zeqiang   

  1. 1.Bionic Vision System Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2021-12-15 Published:2021-12-13

摘要:

基于孪生网络的目标跟踪算法将跟踪问题建模为目标特征和搜索区特征之间的匹配问题。匹配程度通常是根据二者特征之间的相关响应来衡量。目前该衡量方式仍存在以下局限:一方面,对目标的不同区域使用的是相同的特征提取器,没有考虑到目标内部和轮廓处的区别;另一方面,在特征之间相关性的求解过程中,模板空间结构是固定的,无法很好地应对目标形变时的情况,鲁棒性较差。为解决上述问题,提出了一种双分支孪生网络目标跟踪算法SiamDAH(Double Adjust Head Siamese Network for Object Tracking),其中双分支结构旨在考虑目标内部区域和轮廓处的表征需求差异。此外,提出了一种改进的逐像素相关模块,有效降低了传统相关操作时模板结构固定带来的问题。在GOT-10k数据集上的实验结果表明,提出的算法在AO、SR0.5、SR0.75指标上较基准算法分别实现了3.4%、7.0%、2.3%的提升。在NVIDIA RTX 2080Ti上速度可达90 frame/s。

关键词: 目标跟踪, 孪生网络, 双分支, 逐像素相关

Abstract:

Siamese network based trackers formulate tracking as a similarity matching problem between a target template and a search region. Virtually all popular Siamese trackers use cross-correlation to measure the similarity between the feature of template and search image. The emphasis for feature extraction in different regions(inside and contours) are the same. Besides, the global matching also seriously neglects the part-level information and the deformation of targets during tracking. In this paper, a simple but effective Double Adjust Head Siamese Network is proposed to extract features from an object inside and object contours respectively. A Pixelwise Cross-correlation model(PWC) is designed to solve the problem caused by the fixed template structure in conventional correlation operations. Compared with baseline algorithm, the AO, SR0.5, SR0.75 of the proposed algorithm on GOT10k dataset are increased by 3.4%, 7.0% and 2.3%. Running at over 90 frames per second on RTX 2080Ti GPU.

Key words: visual tracking, siamese network, double adjust-head, pixelwise cross-correlation