高低显著性互补特征引导的跨模态行人重识别

doi:10.3778/j.issn.1002-8331.2304-0332

摘要/Abstract

摘要： 在跨模态行人重识别（VI-ReID）任务中，有效挖掘行人图像中的显著信息并缓解模态间存在的差异是提升模型性能的关键。现有工作主要采用基于注意力的方法来增强模型对行人身体上的鉴别特征的学习。但这种方法仅关注行人最显著的区域，而忽略了行人图像中互补的次关键线索。提出了显著性互补特征指导网络（SCFG-Net）。设计了互补特征显著挖掘（CFSM）模块，用于推理出具有全局信息的行人图像的显著特征和被注意力忽略的次关键线索，并将这些特征进行融合，以提高行人图像特征的丰富性和鉴别性。还设计了跨模态判别特征融合（CDFF）模块，用于缓解模态间的颜色差异。实验结果表明，所提出的方法在两个公开数据集上取得了显著的性能提升。在SYSU-MM01数据集的全搜索单镜头模式下，Rank-1和mAP分别达到了74.4%和70.8%。

关键词: 跨模态, 互补特征, 次关键线索, 特征融合, 重识别

Abstract: Efficiently extracting salient information from pedestrian images and mitigating the modality discrepancy are crucial for improving the performance of cross-modal person re-identification (VI-ReID) tasks. Current approaches mainly utilize attention-based methods to enhance the learning of discriminative features on the pedestrians’ bodies. However, these methods only focus on the most salient regions of pedestrians, neglecting the complementary secondary cues present in the pedestrian images. Therefore, this paper proposes a saliency complementary feature guided network (SCFG-Net). Firstly, a complementary feature salient mining (CFSM) module is designed to infer salient features with global information from pedestrian images, as well as the secondary cues that are overlooked by attention mechanisms. These features are then fused to enhance the richness and discriminability of pedestrian image features. Additionally, a cross-modal discriminative feature fusion (CDFF) module is designed to alleviate the color discrepancy between modalities. Experimental results demonstrate the effectiveness of the proposed method on two publicly available datasets. In the single-shot mode of the SYSU-MM01 dataset, the proposed method achieves Rank-1 and mAP scores of 74.4% and 70.8%, respectively.

Key words: cross-modal, complementary feature, sub-critical cues, feature fusion, re-identification (ReID)

陈明, 郭立君, 张荣. 高低显著性互补特征引导的跨模态行人重识别[J]. 计算机工程与应用, 2024, 60(15): 122-132.

CHEN Ming, GUO Lijun, ZHANG Rong. Cross-Modal Pedestrian Re-Identification Guided by Complementary High and Low Salient Features[J]. Computer Engineering and Applications, 2024, 60(15): 122-132.

参考文献

[1] GONG S, CRISTANI M, LOY C C, et al. The re-identification challenge[M]//Person re-identification. [S.l.]: Springer, 2014: 1-20.
[2] SONG W, LI S, CHANG T, et al. Context-interactive CNN for person re-identification[J]. IEEE Transactions on Image Processing, 2019, 29: 2860-2874.
[3] HAN C, ZHENG R, GAO C, et al. Complementation-reinforced attention network for person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(10): 3433-3445.
[4] SUN Y, ZHENG L, YANG Y, et al. Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline)[C]//Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, September 8-14, 2018. [S.l.]: Springer, 2018: 501-518.
[5] WAN C, WU Y, TIAN X, et al. Concentrated local part discovery with fine-grained part representation for person re-identification[J]. IEEE Transactions on Multimedia, 2019, 22(6): 1605-1618.
[6] XIANG X, LV N, YU Z, et al. Cross-modality person re-identification based on dual-path multi-branch network[J]. IEEE Sensors Journal, 2019, 19(23): 11706-11713.
[7] WANG G, YUAN Y, CHEN X, et al. Learning discriminative features with multiple granularities for person re-identification[C]//Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, October 22-26, 2018. [S.l.]: ACM, 2018: 274-282.
[8] CHEN B, DENG W, HU J. Mixed high-order attention network for person re-identification[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), October 27-November 2, 2019. [S.l.]: IEEE, 2019: 371-381.
[9] CHEN T, DING S, XIE J, et al. Abd-net: attentive but diverse person re-identification[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), October 27-November 2, 2019. [S.l.]: IEEE, 2019: 8350-8360.
[10] AINAM J P, QIN K, LIU G, et al. Deep residual network with self attention improves person re-identification accuracy[C]//Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, February 22-24, 2019. [S.l.]: ACM, 2019: 380-385.
[11] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008.
[12] MARTINEL N, FORESTI G L, MICHELONI C. Deep pyramidal pooling with attention for person re-identification[J]. IEEE Transactions on Image Processing, 2020, 29: 7306-7316.
[13] WANG X, GAO C, XIN M, et al. Topology and channel affinity reinforced global attention for person re‐identification[J]. International Journal of Intelligent Systems, 2021, 36(9): 5136-5160.
[14] DU H, LI Z, LIU P, et al. Two‐level salient feature complementary network for person re‐identification[J]. International Journal of Intelligent Systems, 2022, 37(9): 5971-5995.
[15] WU A, ZHENG W S, YU H X, et al. RGB-infrared cross-modality person re-identification[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, October 22-29, 2017. [S.l.]: IEEE, 2017: 5390-5399.
[16] DAI P, JI R, WANG H, et al. Cross-modality person re-identification with generative adversarial training[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, July 13-19, 2018. [S.l.]: AAAI, 2018: 677-683.
[17] FENG Z, LAI J, XIE X. Learning modality-specific representations for visible-infrared person re-identification[J]. IEEE Transactions on Image Processing, 2019, 29: 579-590.
[18] WU A, ZHENG W S, GONG S, et al. RGB-IR person re-identification by cross-modality similarity preservation[J]. International Journal of Computer Vision, 2020, 128: 1765-1785.
[19] YE M, SHEN J J. CRANDALL D, et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, August 23-28, 2020. [S.l.]: Springer International Publishing, 2020: 229-247.
[20] HAO X, ZHAO S, YE M, et al. Cross-modality person re-identification via modality confusion and center aggregation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, October 10-17, 2021: 16383-16392.
[21] WU Q, DAI P, CHEN J, et al. Discover cross-modality nuances for visible-infrared person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, June 19-25, 2021. [S.l.]: IEEE, 2021: 4328-4337.
[22] WANG G, ZHAG T, CHENG J, et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), October 27-November 2, 2019. [S.l.]: IEEE, 2019: 3622-3631.
[23] WANG G A, ZHANG T, YANG Y, et al. Cross-modality paired-images generation for RGB-infrared person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence, New York, February 7-12, 2020. [S.l.]: AAAI, 2020: 12144-12151.
[24] WANG Z X, WANG Z, ZHENG Y Q, et al. Learning to reduce dual-level discrepancy for infrared-visible person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, June 16-20, 2019. [S.l.]: IEEE, 2019: 618-626.
[25] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
[26] LI D, WEI X, HONG X, et al. Infrared-visible cross-modal person re-identification with an X modality[C]//The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, February 7-12, 2020. [S.l.]: AAAI, 2020: 4610-4617.
[27] WEI Z, YANG X, WANG N, et al. Syncretic modality collaborative learning for visible infrared person re-identification[C]//International Conference on Computer Vision (ICCV), Montreal, QC, Canada, October 10-17, 2021. [S.l.]: IEEE, 2021: 225-234.
[28] ZHANG Y, YAN Y, LU Y, et al. Towards a unified middle modality learning for visible-infrared person re-identification[C]//Proceedings of the 29th ACM International Conference on Multimedia, October 20-24, 2021: 788-796.
[29] NGUYEN D T, HONG H G, KIM K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors, 2017, 17(3): 605.
[30] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 27-30, 2016. [S.l.]: IEEE, 2016: 770-778.
[31] ZHONG Z, ZHENG L, KANG G, et al. Random erasing data augmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, February 7-12, 2020. [S.l.]: AAAI, 2020: 13001-13008.
[32] LUO H, GU Y, LIAO X, et al. Bag of tricks and a strong baseline for deep person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), Long Beach, CA, USA, June 16-20, 2019. [S.l.]: IEEE, 2019: 1487-1495.
[33] YE M, LAN X, LI J, et al. Hierarchical discriminative learning for visible thermal person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018. [S.l.]: AAAI, 2018: 7501-7508.
[34] YE M, WANG Z, LAN X, et al. Visible thermal person re-identification via dual-constrained top-ranking[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, July 13-19, 2018. [S.l.]: AAAI, 2018: 1092-1099.
[35] YE M, LAN X, LENG Q, et al. Cross-modality person re-identification via modality-aware collaborative ensemble learning[J]. IEEE Transactions on Image Processing, 2020, 29: 9387-9399.
[36] LIU H, CHENG J, WANG W, et al. Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification[J]. Neurocomputing, 2020, 398: 11-19.
[37] YE M, SHEN J, LIN G, et al. Deep learning for person re-identification: a survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(6): 2872-2893.
[38] LU Y, WU Y, LIU B, et al. Cross-modality person re-identification with shared-specific feature transfer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 13-19, 2020. [S.l.]: IEEE, 2020: 13376-13386.
[39] ZHANG Q, LAI C, LIU J, et al. FMCNet: feature-level modality compensation for visible-infrared person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, June 18-24, 2022. [S.l.]: IEEE, 2022: 7339-7348.
[40] YE M, RUAN W, DU B, et al. Channel augmented joint learning for visible-infrared recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, October 10-17, 2021. [S.l.]: IEEE, 2021: 13547-13556.
[41] CHEN M, WANG Z, ZHENG F. Benchmarks for corruption invariant person re-identification[J]. arXiv:2111.00880, 2021.
[42] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605.