计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (13): 99-109.DOI: 10.3778/j.issn.1002-8331.2203-0498

• 模式识别与人工智能 • 上一篇    下一篇

基于多粒度信息融合的无监督行人重识别方法

温静,张福康   

  1. 山西大学 计算机与信息技术学院,太原 030006
  • 出版日期:2023-07-01 发布日期:2023-07-01

Unsupervised Person Re-Identification Method Based on Multi-Granularity Information Fusion

WEN Jing, ZHANG Fukang   

  1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
  • Online:2023-07-01 Published:2023-07-01

摘要: 现有的无监督行人重识别算法通过残差网络仅能提取粗略的全局特征,但是随着数据集中行人、姿态数目和背景复杂性的激增,这些特征表明行人不同姿态的能力不足,使得模型出现欠拟合,进而导致识别精度下降。基于对上述问题的分析,从空间域和通道域两方面考虑,设计了一种全新的多粒度信息融合的残差块(multi-granularity information fusion residual block,MgIFR block),替换残差网络中常规的残差模块,并以此提出了一种基于多粒度信息融合的无监督行人重识别方法。MgIFR模块在空间域上借鉴自注意力机制的思想,通过卷积提取粗粒度的全局特征;结合这些全局特征和图像中特定像素处编码的query,得到具有像素级上下文信息的细粒度全局特征,将具有粗粒度和细粒度的两种全局特征相结合,得到行人姿态的显著性特征;在通道域上,利用通道注意力机制,对输入的残差特征和跨层特征进行通道加权融合,最终得到具有多粒度信息融合的特征,以此来提高模型应对不同行人姿态的能力。实验结果表明,在现有公开数据集中,特别是行人数目姿态多和背景更加复杂的数据集上,相较于基线模型,Rank-1最高提升了9个百分点,mAP最高提升了10.7个百分点。提出的MgIFR模块具有更好的行人姿态的区分能力,有效解决了行人的不同姿态导致误判的问题,提高了行人重识别的准确率。

关键词: 行人重识别, 多粒度, 残差块, 自注意力机制, 上下文信息, 特征融合, 无监督方法

Abstract: Existing unsupervised person re-identification algorithms can only extract rough global features by residual network. However, as the dramatic increasing of persons, various poses and the complexity of the background in the database, these rough features are insufficient to represent different poses of persons, resulting in underfit of the model, which leads to the decline of recognition accuracy. Based on the analysis of the above problems, taking the information of both the spatial domain and the channel domain into account, an improved multi-granularity information fusion residual block(MgIFR block) is designed, in order to replace the conventional residual module in the ordinary residual network, thus a unsupervised person re-identification method based on MgIFR is proposed. In the spatial domain, the idea of self-attention mechanism is adopted in the construction of the residual block of MgIFR:firstly, the global feature of coarse-granularity is extracted by convolution; then, the fine-granularity global features, which are containing pixel-level context information, are obtained by combining the rough global features with the keys encoded at specific pixels in the image; thirdly, both of the coarse-grained and fine-grained global features are merged to generate the significant features of person postures. In the channel domain, the fusion of the significant pose features and the cross-layer features is weighted by channel attention mechanism, so as to build up the MgIFR block, which is able to improve the ability of the model to tackle different person postures. Experimental results illustrate that the performance of this algorithm on the accessible public datasets, especially those with more person gestures and more complicated background, is remarkable. Compared with the baseline model, the index values of rank-1 and mAP are improved by up to 9?percentage points and 10.7?percentage points, respectively. The proposed MgIFR block has superior ability in distinguishing person postures, effectively solving the problem of misjudgment caused by different person postures, and improving the accuracy of person re-identification recognition.

Key words: person re-identification, multi-granularity, residual block, self-attention mechanism, context information, feature fusion, unsupervised method