计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (23): 173-180.DOI: 10.3778/j.issn.1002-8331.2410-0484

• 模式识别与人工智能 • 上一篇    下一篇

广视角特征融合记忆网络的多目标跟踪算法

张贝宁,汤敏,李洪均,谢正光   

  1. 南通大学 信息科学技术学院,江苏 南通 226019
  • 出版日期:2025-12-01 发布日期:2025-12-01

Multi-Object Tracking Algorithm with Wide-Angle Feature Fusion Memory Network

ZHANG Beining, TANG Min, LI Hongjun, XIE Zhengguang   

  1. School of Information Science and Technology, Nantong University, Nantong, Jiangsu 226019, China
  • Online:2025-12-01 Published:2025-12-01

摘要: 基于无人机视频的多目标跟踪是一项重要的视觉任务,具有广泛的应用前景。然而,由于无人机视角范围广、远距离小尺寸目标难以追踪且目标运动迅速,传统方法面临诸多挑战。为此,提出了一种基于Transformer技术的多目标跟踪方法,称为WideTrack。该方法设计了广视角特征融合记忆网络,以增强对远距离小尺寸目标的捕捉能力。同时,为更好地适应无人机运动特征,在滤波中引入了轨迹置信度建模。通过结合运动特征提取模型和基于空间信息的WIoU匹配算法,设计了一种数据关联方法,综合目标的外观和运动信息以跟踪快速移动的目标。实验结果表明,WideTrack在VisDrone-MOT数据集上的MOTA分数较现有最优模型提高了5.3个百分点;该模型在VisDrone-MOT数据集和UAVDT数据集上的处理速度分别达到16?FPS和29?FPS,验证了其在无人机视频多目标跟踪任务中的有效性。

关键词: 多目标跟踪(MOT), Transformer, 广视角目标捕捉, 轨迹置信度建模

Abstract: Multi-object tracking (MOT) in aerial drone videos presents significant challenges due to wide viewing angles, small and distant targets, and rapid target movements, which limit the effectiveness of traditional methods. This paper proposes WideTrack, an innovative Transformer-based MOT method, which employs a wide-angle feature fusion memory network to enhance tracking capability for small, distant targets. To adapt to drone motion, the paper integrates novel track confidence modeling into the filtering process. Additionally, it develops a data association strategy combining a motion feature extraction model and spatially informed WIoU matching, which effectively merges appearance and motion cues to track fast-moving targets. Experimental results on the VisDrone-MOT and UAVDT datasets demonstrate that WideTrack outperforms existing methods, establishing its efficacy and robustness in drone-based MOT tasks. Experimental results show that WideTrack improves the MOTA score by 5.3 percentage points over the best existing model on the VisDrone-MOT dataset. Moreover, the model achieves processing speeds of 16 frames per second on the VisDrone-MOT dataset and 29?frames per second on the UAVDT dataset, demonstrating its effectiveness in drone-based multi-object tracking tasks.

Key words: multi-object tracking(MOT), Transformer, wide-angle target acquisition, tracklet confidence modeling