计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (12): 115-125.DOI: 10.3778/j.issn.1002-8331.2102-0033

• 模式识别与人工智能 • 上一篇    下一篇

跨模态行人再识别的协同学习方法

陈坤峰,潘志松,王家宝,施蕾,张锦,焦珊珊   

  1. 陆军工程大学 指挥控制工程学院,南京 210007
  • 出版日期:2021-06-15 发布日期:2021-06-10

Collaborative Learning Method for Cross Modality Person Re-identification

CHEN Kunfeng, PAN Zhisong, WANG Jiabao, SHI Lei, ZHANG Jin, JIAO Shanshan   

  1. College of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China
  • Online:2021-06-15 Published:2021-06-10

摘要:

跨模态行人再识别是实现全天候智能视频监控系统的一项关键技术。该技术旨在匹配某一特定身份行人在不重叠摄像头场景下的可见光图像和红外图像,因而面临着巨大的类内变化和模态差异。现有方法难以较好地解决这两大困难,很大程度上是由于欠缺了对特征判别能力的有效挖掘和对多源异质信息的充分利用。鉴于以上不足,使用协同学习方法设计了一个精细化多源特征协同网络,提取多种互补性特征进行信息融合,以提升网络的学习能力。从骨干卷积网络中提取多尺度和多层次特征,实现精细化特征协同学习,以增强特征的判别能力来应对类内变化。设计了模态共有与特有特征协同模块和跨模态人体语义自监督模块,达到多源特征协同学习的目的,以提高多源异质图像信息的利用率,进而解决模态差异。在SYSU-MM01和RegDB数据集上验证了该方法的有效性和先进性。

关键词: 行人再识别, 跨模态, 协同学习, 精细化特征, 多源特征, 信息融合

Abstract:

Cross modality person re-identification is a key technology to realize 24-hour intelligent video surveillance system. This technology is designed to match the visible light image and infrared image of a person with a specific identity in a non-overlapping camera scene, so it faces huge intra-class changes and modality discrepancy. Existing methods are difficult to solve these two major difficulties, which is largely due to the lack of effective mining of feature discrimination and full utilization of multi-source heterogeneous information. In view of the above shortcomings, this paper uses collaborative learning method to design a refined multi-source feature collaborative network, which extracts multiple complementary features for information fusion to enhance the learning ability of the network. Multi-scale and multi-level features are extracted from the backbone convolutional network to realize the collaborative learning of refined features to enhance the discrimination ability of features to deal with intra-class changes. In addition, a modality shared and specific feature collaborative learning module and a cross-modal human semantic self-supervised module are designed to achieve the purpose of multi-source feature collaborative learning, to improve the utilization of multi-source heterogeneous image information, and to resolve modality discrepancy. The effectiveness and advancement of this method have been verified on the SYSU-MM01 and RegDB datasets.

Key words: person re-identification, cross modality, collaborative learning, refined features, multi-source features, information fusion