图神经网络的类别解耦小样本分类

doi:10.3778/j.issn.1002-8331.2208-0373

摘要/Abstract

摘要： 现有的基于度量的小样本图像分类模型展现了一定的小样本学习性能，然而这些模型往往忽略了原始数据被分类关键特征的提取。图像数据中与分类无关的冗余信息被融入小样本模型的网络参数中，容易造成基于度量方法的小样本图像分类性能瓶颈。针对这个问题，提出一种基于图神经网络的类别解耦小样本图像分类模型（VT-GNN），该模型结合图像自注意力与分类任务监督的变分自编码器作为图像嵌入模块，得到原始图像类别解耦特征信息，成为图结构中的一个图节点。通过一个多层感知机为节点之间构建具有度量信息的边特征，将一组小样本训练数据构造为图结构数据，借助图神经网络的消息传递机制实现小样本学习。在公开数据集Mini-Imagenet上，VT-GNN在分别5-way 1-shot与5-way 5-shot设置中相较于基线图神经网络模型分别获得了17.9个百分点和16.25个百分点的性能提升。

关键词: 小样本学习, 图神经网络, 变分自编码器, 图像自注意力

Abstract: Existing metric-based few-shot image classification models show some few-shot image learning performance. However, these models often ignore the extraction of key features of the original data being classified, and redundant information in the image data that is not related to classification is incorporated into the network parameters of the metric method, which easily causes a bottleneck in the performance of few-shot image classification based on metric methods. To address this problem, a category decoupled few-shot image classification model (VT-GNN) based on graph neural network is proposed, which combines image self-attention with a variational self-encoder supervised by a classification task as an embedding module to obtain information of the original image category decoupled features as a graph node in a graph structure. A set of few-shot training data is constructed as graph structure data by constructing edge features with metric information between nodes through a multilayer perceptron, and few-shot learning is achieved with the help of message passing mechanism of graph neural network. On the publicly available dataset Mini-Imagenet, VT-GNN achieves 18.10 percentage points and 16.25 percentage points performance gains relative to the baseline graph neural network model in the 5-way 1-shot and 5-way 5-shot settings, respectively.

Key words: few-shot learning, graph neural network, variational autoencoder, image self-attention

邓戈龙, 黄国恒, 陈紫嫣. 图神经网络的类别解耦小样本分类[J]. 计算机工程与应用, 2024, 60(2): 129-136.

DENG Gelong, HUANG Guoheng, CHEN Ziyan. Category Decoupled Few-Shot Classification for Graph Neural Network[J]. Computer Engineering and Applications, 2024, 60(2): 129-136.

参考文献

[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60（6）: 84-90.
[2] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014.
[3] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9.
[4] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[5] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4700-4708.
[6] ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8697-8710.
[7] LAKE B M, SALAKHUTDINOV R, TENENBAUM J B. Human-level concept learning through probabilistic program induction[J]. Science, 2015, 350（6266）: 1332-1338.
[8] GARCIA V, BRUNA J. Few-shot learning with graph neural networks[J]. arXiv:1711.04043, 2017.
[9] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]//Advances in Neural Information Processing Systems, 2016: 3637-3645.
[10] SUNG F, YANG Y, ZHANG L, et al. Learning to compare: relation network for few-shot learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1199-1208.
[11] ORESHKIN B N, RODRíGUEZ P, LACOSTE A. TADAM: task dependent adaptive metric for improved few-shot learning[C]//Advances in Neural Information Processing Systems, 2018: 719-729.
[12] YANG K, ZHOU T, ZHANG Y, et al. Class-disentanglement and applications in adversarial detection and defense[C]//Advances in Neural Information Processing Systems, 2021: 16051-16063.
[13] KINGMA D P, WELLING M. Auto-encoding variational bayes[J]. arXiv:1312.6114, 2013.
[14] KOCH G, ZEMEL R, SALAKHUTDINOV R. Siamese neural networks for one-shot image recognition[C]//ICML Deep Learning Workshop, 2015.
[15] RAVI S, LAROCHELLE H. Optimization as a model for few-shot learning[C]//International Conference on Learning Representations, 2017: 1-11.
[16] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning, 2017: 1126-1135.
[17] PATACCHIOLA M, TURNER J, CROWLEY E J, et al. Bayesian meta-learning for the few-shot setting via deep kernels[C]//Advances in Neural Information Processing Systems, 2020: 16108-16118.
[18] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63（11）: 139-144.
[19] SCHWARTZ E, KARLINSKY L, SHTOK J, et al. Delta-encoder: an effective sample synthesis method for few-shot object recognition[C]//Advances in Neural Information Processing Systems, 2018: 2850-2860.
[20] RIDGEWAY K. A survey of inductive biases for factorial representation-learning[J]. arXiv:1612.05299, 2016.
[21] HIGGINS I, MATTHEY L, PAL A, et al. beta-VAE: learning basic visual concepts with a constrained variational framework[C]//International Conference on Learning Representations, 2017: 23-44.
[22] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances In Neural Information Processing Systems, 2017: 6000-6010.
[23] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[24] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[25] GORI M, MONFARDINI G, SCARSELLI F. A new model for learning in graph domains[C]//2005 IEEE International Joint Conference on Neural Networks, 2005: 729-734.
[26] SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks, 2008, 20（1）: 61-80.
[27] HAMILTON W, YING Z, LESKOVEC J. Inductive representation learning on large graphs[C]//Advances in Neural Information Processing Systems, 2017: 1025-1035.
[28] STOKES J M, YANG K, SWANSON K, et al. A deep learning approach to antibiotic discovery[J]. Cell, 2020, 180（4）: 688-702.
[29] LANDRIEU L, SIMONOVSKY M. Large-scale point cloud semantic segmentation with superpoint graphs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4558-4567.
[30] YAN S, XIONG Y, LIN D. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Thirty-second AAAI Conference on Artificial Intelligence, 2018.
[31] GIDARIS S, KOMODAKIS N. Generating classification weights with GNN denoising autoencoders for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 21-30.
[32] JOULIN A, CISSé M, GRANGIER D, et al. Efficient softmax approximation for GPUs[C]//International Conference on Machine Learning, 2017: 1302-1310.
[33] KEARNES S, MCCLOSKEY K, BERNDL M, et al. Molecular graph convolutions: moving beyond fingerprints[J]. Journal of Computer-Aided Molecular Design, 2016, 30（8）: 595-608.
[34] VAN DEN OORD A, VINYALS O. Neural discrete representation learning[C]//Advances in Neural Information Processing Systems, 2017: 6309-6318.