Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (16): 94-104.DOI: 10.3778/j.issn.1002-8331.2311-0452

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Cross-Attention Fusion Learning of Transformer-CNN Features for Person Re-Identification

XIANG Jun, ZHANG Jincheng, JIANG Xiaoping, HOU Jianhua   

  1. Hubei Key Laboratory of Intelligent Wireless Communications, South-Central Minzu University, Wuhan 430074, China
  • Online:2024-08-15 Published:2024-08-15

Transformer-CNN特征跨注意力融合学习的行人重识别

项俊,张金城,江小平,侯建华   

  1. 中南民族大学 智能无线通信湖北省重点实验室,武汉 430074

Abstract: Convolutional neural networks (CNN) focus on local features and have difficulty to obtain global structural information. Transformer networks model long-distance feature dependence, but tend to ignore local feature details. Based on cross-attention fusion learning, a person re-identification algorithm is proposed in this paper, which combines the strengths of CNN and Transformer feature learning networks to enrich the local features of pedestrians and improve the global feature representation. The proposed model consists of three parts: the CNN branch mainly extracts local details; the Transformer branch focuses on global feature information; the cross-attention fusion branch calculates the correlation of the features from the above two branches by using the self-attention mechanism, then realizes the feature fusion, and finally improves the representation ability of the model. The ablation experiments and experimental results on Market1501 and DukeMTMC-reID datasets demonstrate the effectiveness of the proposed method.

Key words: person re-identificational, convolutional neural network (CNN), Transformer, cross-attention fusion learning

摘要: 卷积神经网络(convolutional neural network,CNN)关注局部特征,难以获得全局结构信息,Transformer网络建模长距离的特征依赖,但易忽略局部特征细节。提出了一种跨注意力融合学习的行人重识别算法,利用CNN和Transformer特征学习网络的特点,在丰富行人局部特征的同时改善特征的全局表达能力。该模型由三个部分构成:CNN分支主要提取局部细节信息;Transformer分支侧重于关注全局特征信息;跨注意力融合分支通过自注意力机制计算上述两个分支特征的相关性,进而实现特征融合,最终提高模型的表征能力。剥离实验以及在Market1501和DukeMTMC-reID数据集的实验结果证明了所提方法的有效性。

关键词: 行人重识别, 卷积神经网络(CNN), Transformer, 跨注意力融合学习