计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (6): 282-292.DOI: 10.3778/j.issn.1002-8331.2211-0385

• 图形图像处理 • 上一篇    下一篇

时空嵌入感知与多任务协同优化的多目标跟踪

梁孝国,李辉,程远志,陈双敏,刘恒源   

  1. 1.青岛科技大学 信息科学技术学院,山东 青岛 266061
    2.哈尔滨工业大学 计算机学部,哈尔滨 150000
  • 出版日期:2024-03-15 发布日期:2024-03-15

Multi-Object Tracking with Spatial-Temporal Embedding Perception and Multi-Task Synergistic Optimization

LIANG Xiaoguo, LI Hui, CHENG Yuanzhi, CHEN Shuangmin, LIU Hengyuan   

  1. 1.School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao, Shandong 266061, China
    2.Faculty of Computing, Harbin Institute of Technology, Harbin 150000, China
  • Online:2024-03-15 Published:2024-03-15

摘要: 为解决多目标跟踪中遮挡频繁、场景拥挤以及目标尺度多变带来的跟踪挑战,提出时空嵌入感知与多任务协同优化的多目标跟踪方法。提出空间相关性模块以提取空间上带有目标上下文感知的判别力嵌入;提出时序相关性模块聚合来自空间相关性模块提取的嵌入,用于生成时序注意力以引导空间相关性模块在遮挡频繁和拥挤场景下提取更具判别力的嵌入。由此,判别力的嵌入在增强关联鲁棒性的同时可预测更加精确的检测框以克服尺度多变问题,而精确的检测框则促进两个模块提取更加高质量的嵌入,从而实现嵌入提取、位置预测和数据关联多任务间的协同优化。在亲和力矩阵中引入检测框间的GIoU距离以进一步提升遮挡和拥挤场景中关联的鲁棒性。在MOT16、MOT17和MOT20数据集上的实验结果表明,提出的方法表现出比先进方法更优异的跟踪性能。

关键词: 多目标跟踪, 时空嵌入感知, 位置预测, 数据关联, 协同优化

Abstract: To solve the tracking challenges caused by frequent occlusion, crowded scenes and variable object scales in multi-object tracking, a multi-object tracking method is proposed via spatial-temporal embedding perception and multi-task synergistic optimization. Firstly, spatial correlation module is proposed to extract discriminative embedding with object context awareness in spatial. Secondly, temporal correlation module is proposed to aggregate the embedding extracted from spatial correlation module, and aggregated embedding is used to generate temporal attention to guide spatial correlation module to extract more discriminative embedding in frequent occlusion and crowded scenes. Therefore, discriminative embedding enhances association robustness while predicting more accurate detection box to overcome the scale variability issues, and accurate detection box facilitates the extraction of higher quality embedding for the proposed modules. In this way, the synergistic optimization among multiple tasks of embedding extraction, position prediction and data association is achieved. Finally, GIoU distance among detection boxes is introduced into the affinity matrix to further improve association robustness in occlusion and crowded scenes. Experimental results on MOT16, MOT17 and MOT20 datasets show that the proposed method exhibits superior tracking performance to state-of-the-art methods.

Key words: multi-object tracking, spatial-temporal embedding perception, position prediction, data association, synergistic optimization