计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (12): 243-257.DOI: 10.3778/j.issn.1002-8331.2402-0076

• 图形图像处理 • 上一篇    下一篇

多尺度特征响应融合的复杂场景红外目标跟踪

熊偌炎,张上,张岳   

  1. 1.三峡大学 水电工程智能视觉监测湖北省重点实验室,湖北 宜昌 443002
    2.三峡大学 湖北省建筑质量检测装备工程技术研究中心,湖北 宜昌 443002
    3.三峡大学 计算机与信息学院,湖北 宜昌 443002
  • 出版日期:2025-06-15 发布日期:2025-06-13

Multi-Scale Feature Response Fusion for Complex Scene Infrared Target Tracking

XIONG Ruoyan, ZHANG Shang, ZHANG Yue   

  1. 1.Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang, Hubei 443002, China
    2.Hubei Province Engineering Technology Research Center for Construction Quality Testing Equipment, China Three Gorges University, Yichang, Hubei 443002, China
    3.College of Computer and Information Technology, China Three Gorges University, Yichang, Hubei 443002, China
  • Online:2025-06-15 Published:2025-06-13

摘要: 针对红外目标跟踪算法在复杂场景下性能退化的问题,提出了多尺度特征响应融合的红外目标跟踪算法。该算法基于Siamese网络框架,构建了一个自适应模版更新的目标模板库,以提高模版匹配准确性。采用ResNet-50构建了多尺度特征提取与融合网络,通过多分支结构捕获不同信息的特征,并引入分组卷积以及增加每组的基数来提取多样化的深层特征。通过自适应权值分配策略,将不同尺度的特征进行融合;提出了全局感知与快速响应模块,通过可微相关滤波器层实现对整帧图像的全局感知,同时动态生成适应性滤波器以捕捉目标特征。采用核估计概率直方图建立红外目标的多尺度特征模型,在每个前向传播步骤中与候选区域进行比较,提高算法对于目标变化的响应速度和感知能力;提出了空间-通道-帧间交互自注意力模块,使模型能够更好地聚焦于全局空间特征和高响应通道,并利用帧间交互注意力增强了前、后帧信息的互补性。在LSOTB-TIR和PTB-TIR数据集上进行了实验。实验结果表明,在多种复杂场景下,提出的算法显著增强了目标辨别、感知和抗干扰能力。算法的成功率和精准率,在LSOTB-TIR数据集上分别达到了67.3%和80.0%,在PTB-TIR数据集上分别达到了64.5%和83.1%,综合优于对比跟踪算法。

关键词: 红外目标跟踪, 计算机视觉, 孪生网络, 相关滤波

Abstract: To address the issue of performance degradation of infrared target tracking algorithms in complex scenes, a multi-scale feature response fusion infrared target tracking algorithm is proposed. This algorithm is based on the siamese network framework, constructing an adaptive template updating target template library to improve template matching accuracy. A ResNet-50 is utilized to build a multi-scale feature extraction and fusion network, capturing diverse features through a multi-branch structure, and introducing grouped convolutions with increased cardinality per group to extract diversified deep features. Additionally, through an adaptive weight allocation strategy, features of different scales are fused. Moreover, a global perception and rapid response module is proposed, achieving global perception of the entire frame image through differentiable correlation filter layers. Meanwhile, dynamically generating adaptive filters to capture target features. Furthermore, a multi-scale feature model of infrared targets is established using kernel estimation probability histograms. This model is compared with candidate regions at each forward propagation step to enhance the algorithm’s responsiveness and perception to target changes. Finally, a spatial-channel-inter-frame interaction self-attention module is proposed, allowing the model to better focus on global spatial features and highly responsive channels. Additionally, utilizing frame interaction attention to enhance the complementary nature of information between preceding and subsequent frames. Experiments are conducted on the LSOTB-TIR and PTB-TIR datasets. Experimental results demonstrate that proposed algorithm significantly enhances target discrimination, perception, and anti-interference capabilities across various complex scenarios. The success and precision rate of the algorithm reaches 67.3% and 80.0% respectively on the LSOTB-TIR dataset, and 64.5% and 83.1% respectively on the PTB-TIR dataset, outperforming the compared tracking algorithms comprehensively.

Key words: infrared target tracking, computer vision, siamese network, correlation filter