多尺度特征聚合的小样本学习方法

doi:10.3778/j.issn.1002-8331.2204-0278

摘要/Abstract

摘要： 针对大多数小样本学习在特征提取中存在特性信息提取不足、难以准确地提取样本中的重要特征信息以及类内样本多样性可能导致类中心点偏离等问题。提出一种多尺度特征聚合的小样本学习方法（MSFA）。具体来说，该方法利用多尺度生成模块生成关于全部训练样本的多种不同尺度的特征信息，使用自注意力聚合不同尺度的重要特征信息，并将不同尺度的重要特征信息进行拼接，以此来实现关于图像更为准确的特征表达。分别计算每个查询集样本与类原型的距离以及与类内各样本间距离的平均值，并以加权方式得出最终距离。在miniImageNet、tiered-ImageNet和Standford Dogs三个数据集上进行大量的实验，实验结果表明：提出的方法可以大幅提升基线方法的分类性能，特别是在miniImageNet数据集上，在5-way 1-shot和5-way 5-shot设置中，相较于Prototypical Network方法，分类准确率分别提升7.42和6.28个百分点。

关键词: 小样本学习, 特征增强, 自注意力机制, 多尺度特征融合, 图像分类

Abstract: For most few-shot learning, there are problems such as insufficient feature information extraction in feature extraction, difficulty in accurately extracting important feature information in samples, and the diversity of samples within a class may lead to the deviation of class center points. A few-shot learning method（MSFA） for multi-scale feature aggregation is proposed. Specifically, the method uses a multi-scale generation module to generate feature information of multiple different scales about all training samples, secondly self-attention is used to aggregate important feature information of different scales, and important feature information of different scales are spliced. This achieves a more accurate feature representation about the image. Finally, the distance between each query set sample and the class prototype and the average distance from each sample in the class are calculated separately, and the final distance is obtained in a weighted manner. A large number of experiments are carried out on the three datasets of miniImageNet, tieredImageNet and Standford Dogs. The experimental results show that the poposed method can greatly improve the classification performance of the baseline method, especially on the miniImageNet dataset, in 5-way 1-shot and 5-way 5-shot setting, compared with the Prototypical Network method, the classification accuracy is improved by 7.42 and 6.28 percentage points, respectively.

Key words: few-shot learning, feature enhancement, self-attention mechanism, multi-scale feature fusion, image classification

曾武, 毛国君. 多尺度特征聚合的小样本学习方法[J]. 计算机工程与应用, 2023, 59(15): 151-159.

ZENG Wu, MAO Guojun. Few-Shot Learning Method for Multi-Scale Feature Aggregation[J]. Computer Engineering and Applications, 2023, 59(15): 151-159.

参考文献

[1] JI X，HENRIQUES J F，VEDALDI A.Invariant information clustering for unsupervised image classification and segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：9865-9874.
[2] NAM H，HAN B.Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：4293-4302.
[3] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision.Cham：Springer，2016：21-37.
[4] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems，2012.
[5] LI F F，FERGUS R，PERONA P.One-shot learning of object categories[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2006，28（4）：594-611.
[6] TREMBLAY J，PRAKASH A，ACUNA D，et al.Training deep networks with synthetic data：bridging the reality gap by domain randomization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：969-977.
[7] SNELL J，SWERSKY K，ZEMEL R.Prototypical networks for few-shot learning[C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems.Cambridge，MA：MIT Press，2017：4077-4087.
[8] SANTORO A，BARTUNOV S，BOTVINICK M，et al.Meta-learning with memory-augmented neural networks[C]//Proceedings of the 33rd International Conference on Machine Learning.New York：ACM，2016：1842-1850.
[9] NICHOL A，SCHULMAN J.Reptile：a scalable metalearning algorithm[J].arXiv：1803.02999，2018.
[10] KOCH G R，ZEMEL R，SALAKHUTDINOV R.Siamese neural networks for one-shot image recognition[C]//Proceedings of the 32nd International Conference on Machine Learning.New York：ACM，2015.
[11] LI W，WANG L，XU J，et al.Revisiting local descriptor based image-to-class measure for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：7260-7268.
[12] XUE Z，DUAN L，LI W，et al.Region comparison network for interpretable few-shot image classification[J].arXiv：2009.03558，2020.
[13] CHEN H，LI H，LI Y，et al.Multi-scale adaptive task attention network for few-shot learning[J].arXiv：2011. 14479，2020.
[14] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Advances in Neural Information Processing Systems，2017.
[15] ALFASSY A，KARLINSKY L，AIDES A，et al.Laso：label-set operations networks for multi-label few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：6548-6557.
[16] SCHWARTZ E，KARLINSKY L，SHTOK J，et al.Delta-encoder：an effective sample synthesis method for few-shot object recognition[C]//Advances in Neural Information Processing Systems，2018.
[17] VINYALS O，BLUNDELL C，LILLICRAP T，et al.Matching networks for one shot learning[C]//Proceedings of the 30th Annual Conference on Neural Information Processing Systems.Cambridge，MA：MIT Press，2016：3630-3638.
[18] SUNG F，YANG Y，ZHANG L，et al.Learning to compare：relation network for few-shot learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：1199-1208.
[19] ORESHKIN B，RODRíGUEZ LóPEZ P，LACOSTE A.Tadam：task dependent adaptive metric for improved few-shot learning[C]//Advances in Neural Information Processing Systems，2018.
[20] FINN C，ABBEEL P，LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning，2017：1126-1135.
[21] RAVI S，LAROCHELLE H.Optimization as a model for few-shot learning[C]//Proceedings of the 5th International Conference on Learning Representations，2017.
[22] LI Z，ZHOU F，CHEN F，et al.Meta-sgd：learning to learn quickly for few-shot learning[J].arXiv：1707.09835，2017.
[23] ZHANG M，ZHANG J，LU Z，et al.IEPT：instance-level and episode-level pretext tasks for few-shot learning[C]//International Conference on Learning Representations，2021.
[24] REN M，TRIANTAFILLOU E，RAVI S，et al.Meta-learning for semi-supervised few-shot classification[J].arXiv：1803.00676，2018.
[25] KHOSLA A，JAYADEVAPRAKASH N，YAO B，et al.Novel dataset for fine-grained image categorization：stanford dogs[C]//Proceedings of CVPR Workshop on Fine-Grained Visual Categorization（FGVC），2011.
[26] RUSSAKOVSKY O，DENG J，SU H，et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision，2015，115（3）：211-252.
[27] ALLEN K，SHELHAMER E，SHIN H，et al.Infinite mixture prototypes for few-shot learning[C]//International Conference on Machine Learning，2019：232-241.
[28] WU Z，LI Y，GUO L，et al.Parn：position-aware relation networks for few-shot learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：6659-6667.
[29] SIMON C，KONIUSZ P，NOCK R，et al.Adaptive subspaces for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：4136-4145.
[30] LIU B，CAO Y，LIN Y，et al.Negative margin matters：understanding margin in few-shot classification[C]//European Conference on Computer Vision.Cham：Springer，2020：438-455.
[31] OH J，YOO H，KIM C H，et al.Boil：towards representation change for few-shot learning[J].arXiv：2008.08882，2020.
[32] HUANG H X，ZHANG J J，ZHANG J，et al.Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification[J].IEEE Transactions on Multimedia，2021，23：1666-1680.
[32] AFRASIYABI A，LALONDE J F，GAGNé C.Associative alignment for few-shot image classification[C]//European Conference on Computer Vision.Cham：Springer，2020：18-35.