Adaptive Feature Fusion Embedding Network for Few Shot Fine-Grained Image Classification

doi:10.3778/j.issn.1002-8331.2204-0201

Abstract

Abstract: The existing few shot learning algorithms cannot fully extract the features of fine-grained images, leading to the low classification accuracy of fine-grained image. In order to better model the features extracted from the few shot fine-grained image classification（FSFGIC） algorithms, an adaptive feature fusing FSFGIC algorithm is proposed in this paper. Firstly, an adaptive feature fusion embedded network, which can extract deep semantic features and shallow location structure features, and extract key features using adaptive algorithm and attention mechanism, is designed for feature extraction. Secondly, a single image training and multi-image training methods are used to train the feature extraction network successively, which focus on the relationship between a pair of images. Finally, in order to make the distance of the same class of feature vectors in the feature space closer, and the distance of the feature vectors of different classes is greater, the feature distribution conversion, quadrature right trigonometric decomposition and normalization process are performed on the extracted feature vectors. In this paper, the algorithm is compared with 9 other algorithms, and the accuracy rate of 5 way 1 shot and 5 way 5 shot is evaluated on multiple fine-grained datasets. The accuracies are improved by 5.27 and 2.90?percentage points on the Stanford Dogs dataset, 3.29 and 4.23?percentage points on the Stanford Cars dataset, and the accuracy of the 5 way 1 shot on the CUB-200 dataset is only slightly 0.82?percentage points lower than that of DLG, but the 5 way 5 shot is improved by 1.55?percentage points.

Key words: few shot learning, fine-grained image classification, adaptive feature fusion, attention mechanism

摘要： 现有的小样本学习算法未能充分提取细粒度图像的特征，导致细粒度图像分类准确率较低。为了更好地对基于度量的小样本细粒度图像分类算法中提取的特征进行建模，提出了一种基于自适应特征融合的小样本细粒度图像分类算法。在特征提取网络上设计了一种自适应特征融合嵌入网络，可以同时提取深层的强语义特征和浅层的位置结构特征，并使用自适应算法和注意力机制提取关键特征。在训练特征提取网络上采用单图训练和多图训练方法先后训练，在提取样本特征的同时关注样本之间的联系。为了使得同一类的特征向量在特征空间中的距离更加接近，不同类的特征向量的距离更大，对所提取的特征向量做特征分布转换、正交三角分解和归一化处理。提出的算法与其他9种算法进行实验对比，在多个细粒度数据集上评估了5 way 1 shot的准确率和5 way 5 shot的准确率。在Stanford Dogs数据集上的准确率提升了5.27和2.90个百分点，在Stanford Cars数据集上的准确率提升了3.29和4.23个百分点，在CUB-200数据集上的5 way 1 shot的准确率只比DLG略低0.82个百分点，但是5 way 5 shot上提升了1.55个百分点。

关键词: 小样本学习, 细粒度图像分类, 自适应特征融合, 注意力机制

XIE Yaohua, ZHANG Weichuan, REN Jie, JING Junfeng. Adaptive Feature Fusion Embedding Network for Few Shot Fine-Grained Image Classification[J]. Computer Engineering and Applications, 2023, 59(3): 184-192.

解耀华, 章为川, 任劼, 景军锋. 基于自适应特征融合的小样本细粒度图像分类[J]. 计算机工程与应用, 2023, 59(3): 184-192.

References

[1] LI F F，ROB F，PIETRO P.One-shot learning of object categories[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2006，28（4）：594-611.
[2] LI F F，FERGUS R，PERONA P.A Bayesian approach to unsupervised one-shot learning of object categories[C]//Proceedings of the Ninth IEEE International Conference on Computer Vision（ICCV），Oct 13-16，2003，Nice，France.New York：IEEE，2003：1134-1141.
[3] SHU J，XIE Q，YI L，et al.Meta-weight-net：learning an explicit mapping for sample weighting[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems（NIPS），Dec 8-14，2019，Vancouver，BC，Canada，2019：1919-1930.
[4] XIAO T，XU Y，YANG K，et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition（CVPR），June 7-12，2015，Boston，MA，USA，2015：842-850.
[5] PENG Y，HE X，ZHAO J.Object-part attention model for fine-grained image classification[J].IEEE Transactions on Image Processing，2017，27（3）：1487-1500.
[6] RAVI S，LAROCHELLE H.Optimization as a model for few-shot learning[C]//Proceedings of the 5th International Conference on Learning Representations（ICLR），Apr 24-26，2017，Toulon，France.New York：IEEE，2017：1-11.
[7] FINN C，ABBEEL P，LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning（ICML），Aug 6?11，2017，Sydney，NSW，Australia.New York：IEEE，2017：1126-1135.
[8] TIAN C，ZHU X，HU Z，et al.A transfer approach with attention reptile method and long-term generation mechanism for few-shot traffic prediction[J].Neurocomputing，2021，452（1）：15-27.
[9] RUSU A A，RAO D，SYGNOWSKI J，et al.Meta-learning with latent embedding optimization[C]//International Conference on Learning Representations（ICLR），New Orleans，LA，USA，2019：1-11.
[10] LI W B，XU J L，HUO O J，et al.Distribution consistency based covariance metric networks for few-shot learning[C]//Proceedings of the The Thirty-Third AAAI Conference on Artificial Intelligence，?Honolulu，Hawaii，USA，2019：8642-8649.
[11] DONG C Q，LI W B，HUO J，et al，Learning task-aware local representations for few-shot learning[C]//Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence（IJCAI），Yokohama，Japan，2020：716-722.
[12] LI W，WANG L，XU J，et al.Revisiting local descriptor based image-to-class measure for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition（ICCV），Long Beach，CA，USA，2019：7260-7268.
[13] VINYALS O，BLUNDELL C，LILLICRAP T，et al.Matching networks for one shot learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems（NIPS），Dec 5-10，2016，Barcelona，Spain.New York：IEEE，2016：3637-3645.
[14] SNELL J，SWERSKY K，ZEMEL R.Prototypical networks for few-shot learning[J].Advances in Neural Information Processing Systems，2017，30（1）：4077-4087.
[15] SATORRAS V G，ESTRACH J B.Few-shot learning with graph neural networks[C]//International Conference on Learning Representations（ICLR），Vancouver，BC，Canada，2018：1-13.
[16] ZHANG C，CAI Y，LIN G，et al.DeepEMD：few-shot image classification with differentiable earth mover’s distance and structured classifiers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），Seattle，WA，USA，2020：12203-12213.
[17] WANG D，MA Q，ZHENG Q，et al.Improved local-feature-based few-shot learning with Sinkhorn metrics[J].International Journal of Machine Learning and Cybernetics，2022，13（4）：1099-1114.
[18] CAO S，WANG W，ZHANG J，et al.A few-shot fine-grained image classification method leveraging global and local structures[J].International Journal of Machine Learning and Cybernetics，2022，13：2273-2281.
[19] HE K M，ZHANG X Y，REN S Q，et al.Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Jun 27-30，2016，Las Vegas，NV，USA.New York：IEEE，2016：770-778.
[20] ZHONG X，GONG O，HUANG W，et al.Squeeze-and-excitation wide residual networks in image classification[C]//2019 IEEE International Conference on Image Processing（ICIP），Sep 22-25，2019，Taipei，Taiwan，China.New York：IEEE，2019：395-399.
[21] PINHEIRO P O，LIN T Y，COLLOBERT R，et al.Learning to refine object segments[C]//European Conference on Computer Vision（ECCV），2016：75-91.
[22] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision（ECCV），2016：21-37.
[23] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Honolulu，HI，USA，2017：2117-2125.
[24] LIU S，QI L，QIN H，et al.Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Salt Lake City，UT，USA，2018：8759-8768.
[25] HU J，SHEN L，SUN G.Squeeze-and-excitation net-works[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Jun 18-22，2018，Salt Lake City，UT，USA.New York：IEEE，2018：7132-7141.
[26] HU Y，GRIPON V，PATEUX S.Leveraging the feature distribution in transfer-based few-shot learning[C]//International Conference on Artificial Neural Networks（ICANN），Sep 14-17，2021，Bratislava，Slovakia.Berlin：Springer，2021：487-499.
[27] GANDER W.Algorithms for the QR decomposition[J].Research Report，1980，80（2）：1251-1268.
[28] KHOSLA A，JAYADEVAPRAKASH N，YAO B，et al.Novel dataset for fine-grained image categorization：stanford dogs[J].Computer Science，2012，2（1）：1-2.
[29] KRAUSE J，STARK M，DENG J，et al.3D object representations for fine-grained categorization[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops（ICCV），Dec 2-8，2013，Sydney，Australia.New York：IEEE，2013：554-561.
[30] OH SONG H，XIANG Y，JEGELKA S，et al.Deep metric learning via lifted structured feature embedding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition（CVPR），Jun 27-30，2016，Las Vegas，NV，USA.New York：IEEE，2016：4004-4012.
[31] KINGMA D P，BA J.Adam：a method for stochastic optimization[C]//Proceedings of the International Conference on Learning Representations（ICLR），May 7-9，2015，San Diego，CA，USA.New York：IEEE，2016：1-15.
[32] MOLCHANOV P，TYREE S，KARRAS T，et al.Pruning convolutional neural networks for resource efficient inference[C]//5th International Conference on Learning Representations（ICLR），April 24-26 2017，Toulon，France，2019：1-17.
[33] WOLD S，ESBENSEN K，GELADI P.Principal component analysis[J].Chemometrics and Intelligent Laboratory Systems，1987，2（1/3）：37-52.
[34] VAN DER MAATEN L，HINTON G.Visualizing data using t-SNE[J].Journal of Machine Learning Research，2008，9（11）：2579-2605.