计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (20): 132-140.DOI: 10.3778/j.issn.1002-8331.2103-0105

• 模式识别与人工智能 • 上一篇    下一篇

结合多尺度特征学习与特征对齐的行人重识别

金子丰,卞春江,陈实   

  1. 1.中国科学院 国家空间科学中心 复杂航天系统综合电子与信息技术重点实验室,北京 100190
    2.中国科学院大学 计算机科学与技术学院,北京 100049
  • 出版日期:2022-10-15 发布日期:2022-10-15

Person Re-Identification Based on Multi-Scale Feature Learning and Feature Alignment

JIN Zifeng, BIAN Chunjiang, CHEN Shi   

  1. 1.Key Laboratory of Intergrated Avionics and Information Technology for Complex Aerospace System, National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
    2.School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2022-10-15 Published:2022-10-15

摘要: 利用卷积神经网络对行人图像提取一个简单的全局特征,在复杂的行人重识别任务中无法获得令人满意的结果。局部特征学习的方式有助于获取更丰富的人体特征,但往往需要图像中的人体具有良好的空间对齐,而且,将人体各部分特征输入到独立的分支学习局部信息,忽略了人体各部分特征间的相关性,限制模型的性能提升。在此背景下,提出了一种新的多尺度特征学习算法,结合全局与局部特征学习得到更好的行人表示,提升复杂场景下模型的识别能力。对骨干网络不同深度输出的行人特征图,通过特征对齐模块对其执行空间变换,实现行人特征在空间上的矫正和对齐,进一步增强模型的泛化性能。在公开的大型行人重识别数据集上,与当前一些流行的方法进行了比较,验证了所提方法的有效性。

关键词: 深度学习, 卷积神经网络, 行人重识别, 多尺度特征学习, 行人对齐

Abstract: A simple global feature extracted by convolution neural network from the pedestrian image can not get satisfactory results in the complex person re-identification task. Local feature learning is helpful to obtain more abundant human features, but it often requires the human bodies in the images to have a good spatial alignment. Moreover, the features of each part of the human body are input into independent branches to learn local information, which ignores the correlation between the features of each part of the human body and limits the performance improvement of the model. Based on this background, a new multi-scale feature learning algorithm is proposed, which combines global and local feature learning to get better pedestrian representation and improve the recognition ability of the model in complex scenes. In order to realize the spatial correction and alignment of human feature maps, the feature alignment module is used to perform spatial transformation on the feature maps from different depths of the backbone network, and further enhances the generalization performance of the model. Compared with some popular methods, the proposed method achieves excellent performance on the large public person re-identification datasets.

Key words: deep learning, convolutional neural network(CNN), person re-identification, multi-scale feature learning, pedestrian alignment