运用模态融合的半监督广义零样本学习

doi:10.3778/j.issn.1002-8331.2008-0349

摘要/Abstract

摘要： 映射域漂移和偏见性预测问题使得现有的方案无法很好地应对广义零样本学习挑战。在CADA-VAE模型的基础上，提出了基于模态融合的半监督学习方案，就如何利用未标注样本及语义辅助模型进行模态内自学习提供了一种思路。该方案使用潜层向量空间作为视觉和语义模态融合的桥梁，提出了视觉质心和异类语义潜层向量概念，用以指导模态间互学习；在交叉重构环节，以视觉质心为轴，将语义潜层向量交叉重构为此类的视觉特征；在特征编码环节，沿异类语义潜层向量的负方向将视觉特征编码为潜层向量；保证了生成的样本具有多样性的同时不失类间区分度。通过在三个基准数据集上进行对比实验，证明了该模型在识别精度上优于当下主流方案，并且能够很好地应对标注样本稀少的情况。

关键词: 广义零样本学习, 模态融合, 半监督学习, 视觉质心

Abstract: Projection domain drift and prejudice prediction problems make the existing schemes unable to meet the challenge of generalized zero-shot learning well. Based on the CADA-VAE, this article proposes a semi-supervised learning scheme based on modal fusion which provides a way of how to use unlabeled samples and semantic help the model for intra-modal self-learning. This solution uses the latent layer vector space as a bridge for the fusion of visual and semantic modalities, and proposes the concept of visual centroid and heterogeneous semantic latent layer vectors to guide mutual learning between modalities. In the cross-reconstruction link, the semantic latent layer vector is cross-reconstructed into visual features by taking the visual centroid as the axis; in the feature coding link, the visual feature is coded as a latent layer vector along the opposite direction of the heterogeneous semantic latent layer vector. This scheme ensures the generated samples have diversity while not losing the discrimination between classes. Comparative experiments on three benchmark data sets proves that this model is superior to the current mainstream solutions in recognition accuracy, and it can cope with the scarcity of labeled samples.

Key words: generalized zero-shotlearning, modal fusion, semi-supervised learning, visual centroid

林爽, 王晓军. 运用模态融合的半监督广义零样本学习[J]. 计算机工程与应用, 2022, 58(5): 163-171.

LIN Shuang, WANG Xiaojun. Semi-supervised Generalized Zero-Shot Learning Using Modal Fusion[J]. Computer Engineering and Applications, 2022, 58(5): 163-171.

参考文献

[1] LAMPERT C H，NICKISCH H，HARMELING S.Learning to detect unseen object classes by between-class attribute transfer[C]//IEEE Conference on Computer Vision and Pattern Recognition，2009：951-958.
[2] PATTERSON G，HAYS J.SUN attribute database：discovering，annotating，and recognizing scene attributes[C]//IEEE Conference on Computer Vision and Pattern Recognition，2012：2751-2758.
[3] MIKOLOV T.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems，2013：3111-3119.
[4] PENNINGTON J，SOCHER R，MANNING C D，et al.Glove：global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing，2014：1532-1543.
[5] ELHOSEINY M，SALEH B，ELGAMMAL A，et al.Write a classifier：zero-shot learning using purely textual descriptions[C]//IEEE International Conference on Computer Vision，2013：2584-2591.
[6] LUO C，LI Z，HUANG K，et al.Zero-shot learning via attribute regression and class prototype rectification[J].IEEE Transactions on Image Processing，2018，27（2）：637-648.
[7] AKATA Z，PERRONNIN F，HARCHAOUI Z，et al.Label-embedding for attribute-based classification[C]//IEEE Conference on Computer Vision and Pattern Recognition，2013：819-826.
[8] 冀中，汪浩然，于云龙，等.零样本图像分类综述：十年进展[J].中国科学：信息科学，2019（10）：1299-1320.
JI Z，WANG H R，YU Y L，et al.A decadal survey of zero-shot image classification[J].SCIENTIA SINICA Informationis，2019（10）：1299-1320.
[9] 张鲁宁，左信，刘建伟.零样本学习研究进展[J].自动化学报，2020，46（1）：1-23.
ZHANG L N，ZUO X，LIU J W.Reach and development on zero-shot learning[J].Acta Automatica Sinica，2020，46（1）：1-23.
[10] ZHU Y，ELHOSEINY M，LIU B，et al.A generative adversarial approach for zero-shot learning from noisy texts[C]//IEEE Conference on Computer Vision and Pattern Recognition，2018：1004-1013.
[11] RAHMAN S，KHAN S，PORIKLI F.A unified approach for conventional zero-shot，generalized zero-shot and few-shot learning[J].IEEE Transactions on Image Processing，2018，27（11）:5652-5667.
[12] SCHONFELD E，EBRAHIMI S，SINHA S，et al.Generalized zero-and few-shot learning via aligned variational autoencoders[C]//IEEE Conference on Computer Vision and Pattern Recognition，2019：8247-8255.
[13] VERMA V K，ARORA G，MISHRA A K，et al.Generalized zero-shot learning via synthesized examples[C]//IEEE Conference on Computer Vision and Pattern Recognition，2018：4281-4289.
[14] MISHRA A，REDDY M S K，MITTAL A，et al.A generative model for zero shot learning using conditional variational autoencoders[J].arXiv：1709.00663v1，2017.
[15] CHAO W L，CHANPINYO S，GONG B Q，et al.An empirical study and analysis of generalized zero-shot learning for object recognition in the wild[J].Frontiers of Information Technology & Electronic Engineering，2016，17（5）：403-412.
[16] FROME A，CORRADO G S，SHLENS J，et al.DeViSE：a deep visual-semantic embedding model[C]//Advances in Neural Information Processing Systems，2013：2121-2129.
[17] AKATA Z，REED S，WALTER D，et al.Evaluation of output embeddings for fine-grained image classification[C]//IEEE Conference on Computer Vision and Pattern Recognition，2015：2927-2936.
[18] AKATA Z，PERRONNIN F，HARCHAOUI Z，et al.Label embedding for image classification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2016：1425-1438.
[19] ROMERAPAREDES B，TORR P H.An embarrassingly simple approach to zero-shot learning[C]//International Conference on Machine Learning，2015：2152-2161.
[20] XIAN Y，AKATA Z，SHARMA G，et al.Latent embeddings for zero-shot classification[J].arXiv：1603.08895，2016.
[21] RICHARDS，MILIND G.Zero-shot learning through cross-modal transfer[C]//Advances in Neural Information Processing Systems，2013：935-943.
[22] CHANGPINYO S，CHAO W，GONG B，et al.Synthesized classifiers for zero-shot learning[C]//IEEE Conference on Computer Vision and Pattern Recognition，2016：5327-5336.
[23] XIAN Y Q，LAMPERT C H，BERNTE S，et al.Zero-shot learning-a comprehensive evaluation of the good，the bad and，the ugly[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2019，41（9）：2251-2265.
[24] TSAI Y H H，HUANG L K，SALAKHUTDINOV R.Learning robust visual-semantic embeddings[C]//IEEE International Conference on Computer Vision，2017：3591-3600.
[25] MISHRA A，REDDY M S K，MITTAL A，et al.A generative model for zero shot learning using conditional variational autoencoders[J].arXiv：1709.00663v1，2017.
[26] VERMA V K，ARORA G，MISHRA A，et al.Generalized zero-shot learning via synthesized examples[C]//IEEE Conference on Computer Vision and Pattern Recognition，2018：4281-4289.
[27] XIAN Y，LORENZ T，SCHIELE B，et al.Feature generating networks for zero-shot learning[C]//IEEE Conference on Computer Vision and Pattern Recognition，2018：5542-5551.
[28] KINGMA D P，WELLING M.Auto-encoding variational Bayes[J].arXiv：1312.6114v10，2013.