Cross-Modality Person Re-identification Combined with Data Augmentation and Feature Fusion

doi:10.3778/j.issn.1002-8331.2209-0120

Abstract

Abstract: The difficulty of visible-infrared person re-identification problem lies in the large modal difference between images. Most existing methods alleviate the modal difference by generating fake images through generative adversarial networks or extracting modal shared features on the original image. However, training a generative adversarial network consumes a lot of computational resources and generates fake images that are prone to introduce noise, and extracting modal shared features can also result in the loss of important differentiated features. To address these problems, a new cross-modality person re-identification network is proposed. Firstly, automatic data augmentation is used to improve model robustness. Then, instance regularization is used in the network to reduce modal differences. Finally, the pedestrian features of different scales extracted by each layer of the network are organically fused, and the fused features contain more differentiated features related to pedestrian identity. The proposed method achieves Rank-1/mAP of 69.47%/65.05% in the all-search mode of the SYSU-MM01, and Rank-1/mAP of 85.73%/77.77% in the visible to infrared modes of the RegDB, respectively. The experimental results have a significant improvement effect.

Key words: cross-modality, person re-identification, automatic data augmentation, feature fusion

摘要： 可见光-红外行人重识别问题的难点在于图像间模态差异大，大多数现有的方法通过生成对抗网络生成伪图像或提取原始图像上的模态共享特征来缓解模态差异。然而，训练生成对抗网络需要消耗大量的计算资源且生成的伪图像容易引入噪声，提取模态共享特征也会不可避免地导致与行人身份相关的重要判别特征丢失。针对以上问题，提出新的跨模态行人重识别网络。首先将进行自动数据增强后的训练数据集作为网络输入，提高模型的鲁棒性；然后在网络中引入实例正则化来缩小模态差异；最后将网络各层提取到的不同尺度的行人特征进行有机融合，融合后的特征包含更多与行人身份相关的判别特征。该方法在SYSU-MM01数据集的全局搜索模式下Rank-1/mAP分别达到69.47%/65.05%，在RegDB数据集的可见光到红外模式下Rank-1/mAP分别达到85.73%/77.77%，实验结果获得显著提升。

关键词: 跨模态, 行人重识别, 自动数据增强, 特征融合

SONG Yu, WANG Banghai, CAO Ganggang. Cross-Modality Person Re-identification Combined with Data Augmentation and Feature Fusion[J]. Computer Engineering and Applications, 2024, 60(4): 133-141.

宋雨, 王帮海, 曹钢钢. 结合数据增强与特征融合的跨模态行人重识别[J]. 计算机工程与应用, 2024, 60(4): 133-141.

References

[1] CAI X, LIU L, ZHU L, et al. Dual-modality hard mining triplet-center loss for visible infrared person re-identification[J]. Knowledge-Based Systems, 2021, 215: 106772.
[2] WU A, ZHENG W S, YU H X, et al. RGB-infrared cross-modality person re-identification[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 5380-5389.
[3] LIU H, MA S, XIA D, et al. SFANet: a spectrum-aware feature augmentation network for visible-infrared person reidentification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 1958-1971.
[4] WANG Z, WANG Z, ZHENG Y, et al. Learning to reduce dual-level discrepancy for infrared-visible person re-identification[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 618-626.
[5] CHOI S, LEE S, KIM Y, et al. Hi-CMD: hierarchical cross-modality disentanglement for visible-infrared person re-identification[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10257-10266.
[6] ZHAO Z, LIU B, CHU Q, et al. Joint color-irrelevant consistency learning and identity-aware modality adaptation for visible-infrared cross modality person re-identification[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 3520-3528.
[7] YE M, LAN X, WANG Z, et al. Bi-directional center-constrained top-ranking for visible thermal person re-identification[J]. IEEE Transactions on Information Forensics and Security, 2019, 15: 407-419.
[8] YE M, LAN X, LENG Q, et al. Cross-modality person re-identification via modality-aware collaborative ensemble learning[J]. IEEE Transactions on Image Processing, 2020, 29: 9387-9399.
[9] LI D, WEI X, HONG X, et al. Infrared-visible cross-modal person re-identification with an X modality[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 4610-4617.
[10] CHEN Y, WAN L, LI Z, et al. Neural feature search for RGB-infrared person re-identification[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 587-597.
[11] YE M, SHEN J, J CRANDALL D, et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 229-247.
[12] YE M, SHEN J, LIN G, et al. Deep learning for person re-identification: a survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(6): 2872-2893.
[13] PAN X, LUO P, SHI J, et al. Two at once: enhancing learning and generalization capacities via IBN-net[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 464-479.
[14] GRAY D, TAO H. Viewpoint invariant pedestrian recognition with an ensemble of localized features[C]//Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg: Springer, 2008: 262-275.
[15] ZHENG W S, GONG S, XIANG T. Person re-identification by probabilistic relative distance comparison[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, 2011: 649-656.
[16] WANG F, ZUO W, LIN L, et al. Joint learning of single-image and cross-image representations for person re-identification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 1288-1296.
[17] ZHENG L, ZHANG H, SUN S, et al. Person re-identification in the wild[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1367-1376.
[18] SUN Y, ZHENG L, YANG Y, et al. Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline)[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 480-496.
[19] 肖雅妮, 范馨月, 陈文峰. 多分支融合局部特征的行人重识别算法[J]. 计算机工程与应用, 2021, 57(18): 213-219.
XIAO Y N, FAN X Y, CHEN W F. Research on person re-identification based on integrating local features under multi-branches[J]. Computer Engineering and Applications,2021, 57(18): 213-219.
[20] LUO H, JIANG W, GU Y, et al. A strong baseline and batch normalization neck for deep person re-identification[J]. IEEE Transactions on Multimedia, 2019, 22(10): 2597-2609.
[21] ZHONG Z, ZHENG L, KANG G, et al. Random erasing data augmentation[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 13001-13008.
[22] DAI Z, CHEN M, GU X, et al. Batch dropblock network for person re-identification and beyond[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 3691-3701.
[23] YE M, RUAN W, DU B, et al. Channel augmented joint learning for visible-infrared recognition[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 13567-13576.
[24] LING Y, ZHONG Z, LUO Z, et al. Class-aware modality mix and center-guided metric learning for visible-thermal person re-identification[C]//Proceedings of the 28th ACM International Conference on Multimedia, 2020: 889-897.
[25] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[26] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7794-7803.
[27] CUBUK E D, ZOPH B, MANE D, et al. AutoAugment: learning augmentation strategies from data[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 113-123.