计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (4): 133-141.DOI: 10.3778/j.issn.1002-8331.2209-0120

• 模式识别与人工智能 • 上一篇    下一篇

结合数据增强与特征融合的跨模态行人重识别

宋雨,王帮海,曹钢钢   

  1. 广东工业大学  计算机学院,广州  510006
  • 出版日期:2024-02-15 发布日期:2024-02-15

Cross-Modality Person Re-identification Combined with Data Augmentation and Feature Fusion

SONG Yu, WANG Banghai, CAO Ganggang   

  1. College of Computer Science, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2024-02-15 Published:2024-02-15

摘要: 可见光-红外行人重识别问题的难点在于图像间模态差异大,大多数现有的方法通过生成对抗网络生成伪图像或提取原始图像上的模态共享特征来缓解模态差异。然而,训练生成对抗网络需要消耗大量的计算资源且生成的伪图像容易引入噪声,提取模态共享特征也会不可避免地导致与行人身份相关的重要判别特征丢失。针对以上问题,提出新的跨模态行人重识别网络。首先将进行自动数据增强后的训练数据集作为网络输入,提高模型的鲁棒性;然后在网络中引入实例正则化来缩小模态差异;最后将网络各层提取到的不同尺度的行人特征进行有机融合,融合后的特征包含更多与行人身份相关的判别特征。该方法在SYSU-MM01数据集的全局搜索模式下Rank-1/mAP分别达到69.47%/65.05%,在RegDB数据集的可见光到红外模式下Rank-1/mAP分别达到85.73%/77.77%,实验结果获得显著提升。

关键词: 跨模态, 行人重识别, 自动数据增强, 特征融合

Abstract: The difficulty of visible-infrared person re-identification problem lies in the large modal difference between images. Most existing methods alleviate the modal difference by generating fake images through generative adversarial networks or extracting modal shared features on the original image. However, training a generative adversarial network consumes a lot of computational resources and generates fake images that are prone to introduce noise, and extracting modal shared features can also result in the loss of important differentiated features. To address these problems, a new cross-modality person re-identification network is proposed. Firstly, automatic data augmentation is used to improve model robustness. Then, instance regularization is used in the network to reduce modal differences. Finally, the pedestrian features of different scales extracted by each layer of the network are organically fused, and the fused features contain more differentiated features related to pedestrian identity. The proposed method achieves Rank-1/mAP of 69.47%/65.05% in the all-search mode of the SYSU-MM01, and Rank-1/mAP of 85.73%/77.77% in the visible to infrared modes of the RegDB, respectively. The experimental results have a significant improvement effect.

Key words: cross-modality, person re-identification, automatic data augmentation, feature fusion