计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (15): 122-132.DOI: 10.3778/j.issn.1002-8331.2304-0332

• 模式识别与人工智能 • 上一篇    下一篇

高低显著性互补特征引导的跨模态行人重识别

陈明,郭立君,张荣   

  1. 宁波大学 信息科学与工程学院,浙江 宁波 315211
  • 出版日期:2024-08-01 发布日期:2024-07-30

Cross-Modal Pedestrian Re-Identification Guided by Complementary High and Low Salient Features

CHEN Ming, GUO Lijun, ZHANG Rong   

  1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, Zhejiang 315211, China
  • Online:2024-08-01 Published:2024-07-30

摘要: 在跨模态行人重识别(VI-ReID)任务中,有效挖掘行人图像中的显著信息并缓解模态间存在的差异是提升模型性能的关键。现有工作主要采用基于注意力的方法来增强模型对行人身体上的鉴别特征的学习。但这种方法仅关注行人最显著的区域,而忽略了行人图像中互补的次关键线索。提出了显著性互补特征指导网络(SCFG-Net)。设计了互补特征显著挖掘(CFSM)模块,用于推理出具有全局信息的行人图像的显著特征和被注意力忽略的次关键线索,并将这些特征进行融合,以提高行人图像特征的丰富性和鉴别性。还设计了跨模态判别特征融合(CDFF)模块,用于缓解模态间的颜色差异。实验结果表明,所提出的方法在两个公开数据集上取得了显著的性能提升。在SYSU-MM01数据集的全搜索单镜头模式下,Rank-1和mAP分别达到了74.4%和70.8%。

关键词: 跨模态, 互补特征, 次关键线索, 特征融合, 重识别

Abstract: Efficiently extracting salient information from pedestrian images and mitigating the modality discrepancy are crucial for improving the performance of cross-modal person re-identification (VI-ReID) tasks. Current approaches mainly utilize attention-based methods to enhance the learning of discriminative features on the pedestrians’ bodies. However, these methods only focus on the most salient regions of pedestrians, neglecting the complementary secondary cues present in the pedestrian images. Therefore, this paper proposes a saliency complementary feature guided network (SCFG-Net). Firstly, a complementary feature salient mining (CFSM) module is designed to infer salient features with global information from pedestrian images, as well as the secondary cues that are overlooked by attention mechanisms. These features are then fused to enhance the richness and discriminability of pedestrian image features. Additionally, a cross-modal discriminative feature fusion (CDFF) module is designed to alleviate the color discrepancy between modalities. Experimental results demonstrate the effectiveness of the proposed method on two publicly available datasets. In the single-shot mode of the SYSU-MM01 dataset, the proposed method achieves Rank-1 and mAP scores of 74.4% and 70.8%, respectively.

Key words: cross-modal, complementary feature, sub-critical cues, feature fusion, re-identification (ReID)