计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (23): 221-229.DOI: 10.3778/j.issn.1002-8331.2106-0470

• 图形图像处理 • 上一篇    下一篇

基于条件对抗网和层次特征融合的目标跟踪

张磊,单玉刚,袁杰   

  1. 1.新疆大学 电气工程学院,乌鲁木齐 830001
    2.湖北文理学院 教育学院,湖北 襄阳 441053
  • 出版日期:2022-12-01 发布日期:2022-12-01

Target Tracking Based on Conditional Confrontation Network and Hierarchical Feature Fusion

ZHANG Lei, SHAN Yugang, YUAN Jie   

  1. 1.School of Electrical Engineering, Xinjiang University, Urumqi 830001, China
    2.School of Education, Hubei University of Arts and Science, Xiangyang, Hubei 441053, China
  • Online:2022-12-01 Published:2022-12-01

摘要: 为了解决目标跟踪过程中因运动模糊和低分辨率导致跟踪效果变差的问题,提出一种基于条件对抗网和层次特征融合的目标跟踪算法。使用条件对抗生成网络模型(DeblurGAN-v2),对输入的低分辨率视频帧去模糊;使用改进型VGG-19网络提取目标候选区域的Conv2、Conv4、Conv6三层特征,将孪生网络提取到的低层结构特征、中层特征与高层语义特征进行融合,以提高特征的表征能力。在目标跟踪评估数据集OTB2015与VOT2018上的实验结果表明,与SiamFC、SiamDW等其他算法相比,该算法具有更高的准确性,能够适应目标遮挡运动模糊、外观变化及背景干扰等复杂情况。相比于SiamFC,改进算法在OTB2015数据集上成功率提升5.5个百分点,在VOT2018数据集上EAO提升16.4个百分点。

关键词: 目标跟踪, 条件对抗网络, 孪生网络, 特征融合

Abstract: In order to solve the problem of poor tracking effect caused by motion blur and low resolution in the process of target tracking, this paper proposes target tracking algorithm based on conditional confrontation network and hierarchical feature fusion. Firstly, the input low-resolution video frames are deblurred by using the conditional confrontation generation network model(DeblurGAN-v2). Then, VGG-19 network is used to extract Conv2, Conv4 and Conv6 features of the target candidate region, and the low-level structural features, middle-level features and high-level semantic features extracted by siamese network are fused to improve the characterization ability of features. The experimental results on the target tracking evaluation datasets OTB2015 and VOT2018 show that compared with other algorithms such as SiamFC and SiamDW, the proposed algorithm has higher accuracy, and can adapt to complex situations such as motion blur, appearance change and background interference. Compared with SiamFC, the improved algorithm improves the success rate by 5.5?percentage points on OTB2015 datasets and EAO by 16.4?percentage points on VOT2018 datasets.

Key words: target tracking, conditional confrontation network, siamese network, feature fusion