Person Re-Identification Based on Multi-Scale Feature Learning and Feature Alignment

doi:10.3778/j.issn.1002-8331.2103-0105

Abstract

Abstract: A simple global feature extracted by convolution neural network from the pedestrian image can not get satisfactory results in the complex person re-identification task. Local feature learning is helpful to obtain more abundant human features, but it often requires the human bodies in the images to have a good spatial alignment. Moreover, the features of each part of the human body are input into independent branches to learn local information, which ignores the correlation between the features of each part of the human body and limits the performance improvement of the model. Based on this background, a new multi-scale feature learning algorithm is proposed, which combines global and local feature learning to get better pedestrian representation and improve the recognition ability of the model in complex scenes. In order to realize the spatial correction and alignment of human feature maps, the feature alignment module is used to perform spatial transformation on the feature maps from different depths of the backbone network, and further enhances the generalization performance of the model. Compared with some popular methods, the proposed method achieves excellent performance on the large public person re-identification datasets.

Key words: deep learning, convolutional neural network（CNN）, person re-identification, multi-scale feature learning, pedestrian alignment

摘要： 利用卷积神经网络对行人图像提取一个简单的全局特征，在复杂的行人重识别任务中无法获得令人满意的结果。局部特征学习的方式有助于获取更丰富的人体特征，但往往需要图像中的人体具有良好的空间对齐，而且，将人体各部分特征输入到独立的分支学习局部信息，忽略了人体各部分特征间的相关性，限制模型的性能提升。在此背景下，提出了一种新的多尺度特征学习算法，结合全局与局部特征学习得到更好的行人表示，提升复杂场景下模型的识别能力。对骨干网络不同深度输出的行人特征图，通过特征对齐模块对其执行空间变换，实现行人特征在空间上的矫正和对齐，进一步增强模型的泛化性能。在公开的大型行人重识别数据集上，与当前一些流行的方法进行了比较，验证了所提方法的有效性。

关键词: 深度学习, 卷积神经网络, 行人重识别, 多尺度特征学习, 行人对齐

JIN Zifeng, BIAN Chunjiang, CHEN Shi. Person Re-Identification Based on Multi-Scale Feature Learning and Feature Alignment[J]. Computer Engineering and Applications, 2022, 58(20): 132-140.

金子丰, 卞春江, 陈实. 结合多尺度特征学习与特征对齐的行人重识别[J]. 计算机工程与应用, 2022, 58(20): 132-140.

References

[1] ZHENG L，YANG Y，HAUPTMANN A G.Person re-identification：past，present and future[J].arXiv：1610.02984，2016.
[2] 罗浩，姜伟，范星，等.基于深度学习的行人重识别研究进展[J].自动化学报，2019，45（11）：2032-2049.
LUO H，JIANG W，FAN X，et al.A survey on deep learning based person re-identification[J].Acta Automatica Sinica，2019，45（11）：2032-2049.
[3] LI W，WANG X.Locally aligned feature transforms across views[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2013：3594-3601.
[4] DALAL N，TRIGGS B.Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition（CVPR’05），2005：886-893.
[5] LOWE D G.Object recognition from local scale-invariant features[C]//Proceedings of the Seventh IEEE International Conference on Computer Vision，1999：1150-1157.
[6] FARENZENA M，BAZZANI L，PERINA A，et al.Person re-identification by symmetry-driven accumulation of local features[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition，2010：2360-2367.
[7] GHEISSARI N，SEBASTIAN T B，HARTLEY R.Person reidentification using spatiotemporal appearance[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition（CVPR’06），2006：1528-1535.
[8] GRAY D，TAO H.Viewpoint invariant pedestrian recognition with an ensemble of localized features[C]//European Conference on Computer Vision.Berlin，Heidelberg：Springer，2008：262-275.
[9] ZHAO R，OUYANG W，WANG X.Unsupervised salience learning for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2013：3586-3593.
[10] SZEGEDY C，LIU W，JIA Y，et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：1-9.
[11] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[12] HADSELL R，CHOPRA S，LECUN Y.Dimensionality reduction by learning an invariant mapping[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition（CVPR’06），2006：1735-1742.
[13] HOFFER E，AILON N.Deep metric learning using triplet network[C]//International Workshop on Similarity-Based Pattern Recognition.Cham：Springer，2015：84-92.
[14] CHENG D，GONG Y，ZHOU S，et al.Person re-identification by multi-channel parts-based CNN with improved triplet loss function[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：1335-1344.
[15] CHEN W，CHEN X，ZHANG J，et al.Beyond triplet loss：a deep quadruplet network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：403-412.
[16] SUN Y，XU Q，LI Y，et al.Perceive where to focus：learning visibility-aware part-level features for partial person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：393-402.
[17] TIAN Y，LUO P，WANG X，et al.Pedestrian detection aided by deep learning semantic tasks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：5079-5087.
[18] ZHAO Y，SHEN X，JIN Z，et al.Attribute-driven feature disentangling and temporal aggregation for video person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2019：4913-4922.
[19] LIN Y，ZHENG L，ZHENG Z，et al.Improving person re-identification by attribute and identity learning[J].Pattern Recognition，2019，95：151-161.
[20] VARIOR R R，SHUAI B，LU J，et al.A siamese long short-term memory architecture for human re-identification[C]//European Conference on Computer Vision.Cham：Springer，2016：135-153.
[21] HOCHREITER S，SCHMIDHUBER J.Long short-term memory[J].Neural Computation，1997，9（8）：1735-1780.
[22] ZHANG X，LUO H，FAN X，et al.Alignedreid：surpassing human-level performance in person re-identification[J].arXiv：1711.08184，2017.
[23] ZHENG Z，ZHENG L，YANG Y.Pedestrian alignment network for large-scale person re-identification[J].IEEE Transactions on Circuits and Systems for Video Technology，2018，29（10）：3037-3045.
[24] JADERBERG M，SIMONYAN K，ZISSERMAN A.Spatial transformer networks[C]//Advances in Neural Information Processing Systems，2015：2017-2025.
[25] DAI J，ZHANG P，WANG D，et al.Video person re-identification by temporal residual learning[J].IEEE Transactions on Image Processing，2018，28（3）：1366-1377.
[26] HAN C，YE J，ZHONG Y，et al.Re-ID driven localization refinement for person search[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：9814-9823.
[27] ZHAO H，TIAN M，SUN S，et al.Spindle net：person re-identification with human body region guided feature decomposition and fusion[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1077-1085.
[28] WEI L，ZHANG S，YAO H，et al.GLAD：global-local-alignment descriptor for scalable person re-identification[J].IEEE Transactions on Multimedia，2018，21（4）：986-999.
[29] SUN Y，ZHENG L，YANG Y，et al.Beyond part models：person retrieval with refined part pooling （and a strong convolutional baseline）[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：480-496.
[30] LUO H，GU Y，LIAO X，et al.Bag of tricks and a strong baseline for deep person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops，2019.
[31] HERMANS A，BEYER L，LEIBE B.In defense of the triplet loss for person re-identification[J].arXiv：1703.07737，2017.
[32] ZHENG L，SHEN L，TIAN L，et al.Scalable person re-identification：a benchmark[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1116-1124.
[33] RISTANI E，SOLERA F，ZOU R，et al.Performance measures and a data set for multi-target，multi-camera tracking[C]//European Conference on Computer Vision.Cham：Springer，2016：17-35.
[34] DENG J，DONG W，SOCHER R，et al.Imagenet：a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition，2009：248-255.
[35] DAI Z，CHEN M，GU X，et al.Batch dropblock network for person re-identification and beyond[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：3691-3701.
[36] TAY C P，ROY S，YAP K H.AANet：attribute attention network for person re-identifications[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：7134-7143.
[37] ZHONG Z，ZHENG L，CAO D，et al.Re-ranking person re-identification with k-reciprocal encoding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017.