基于小样本StyleGAN的多类别车损图像生成方法

doi:10.3778/j.issn.1002-8331.2206-0470

摘要/Abstract

摘要： 现有车损图像数据集存在样本量少、多样性不足、分布不均衡等问题，这些问题可通过图像生成缓解。StyleGAN是较新的能生成高分辨率且不失真图像的方法，被证明对医学和人脸图像增强有效，但针对小样本和多样性较强的样本的研究较少。针对车损图像，研究小样本、高样本多样性的条件StyleGAN生成方法。针对有限车损图像样本对抗模型训练过程中影响模型收敛的因素，进行参数分析及优化，在约1?500个样本、128×128分辨率的多类别车损图像数据集上将FID值降低到41.3，解决了传统方法因样本较少导致模型不收敛的问题。在此基础上构建了随机生成、样式混合生成及解耦放缩生成等三种基于对抗模型的多类别车损图像生成方法。基于此三种图像生成方法实现对车损图像训练集的扩增，并通过数值实验证明了其对下游图像分类任务的有效性。研究了生成模型的空间潜向量解耦方法，并分析解耦方向的实际物理含义以及不同图像生成方式对图像分类任务提升效果的差异及原因，对未来进一步提升对抗模型的多类别车损图像生成方法提供了一些线索和依据。数据集与代码已公开于https：//github.com/derby-ding/StyleGAN_cardemage_class。

关键词: 图像生成, 对抗生成网络, 小样本学习, 数据增强

Abstract: Existing multi-class damaged car image datasets have problems such as limited sample number, insufficient and unbalanced class distribution, which can be relieved by image generation. StyleGAN can generate new images with high resolution without distortion, and has been proven effective especially on medical and face images, however less research has been done on few-shot learning and high sample diversity. This paper investigates the few shot StyleGAN generation method for car damage images with high sample diversity. It first parametrically analyzes and optimizes those key factors that affect the convergence properties of the adversarial generative models on limited sample number, such that an FID of 41.3 is achieved on 1?500 number of 128×128?pixel damaged car image generation. Based on the general adversarial generative model, it proposes three damaged car image generation models, i.e., random generation, style-based generation and decoupling-based generation schemes. Effectiveness of the generated dataset on improving damaged car image classification is demonstrated by experimental results, which verifies the usefulness of the adversarial generative models. It further investigates latent vector decoupling in the generation space and the actual physical meaning of the decoupling directions. It also analyzes the differences and reasons for the improvement of image classification tasks by different image generation methods. Those analysis results provide insights in the further improvement of the generation models.

Key words: image generation, adversarial generative network, few shot learning, data augmentation

丁锴, 杨佳熹, 杨耀, 那崇宁. 基于小样本StyleGAN的多类别车损图像生成方法[J]. 计算机工程与应用, 2023, 59(23): 202-210.

DING Kai, YANG Jiaxi, YANG Yao, NA Chongning. Multi-Label Car Damage Image Generation Based on Few Shot StyleGAN[J]. Computer Engineering and Applications, 2023, 59(23): 202-210.

参考文献

[1] PATIL K，KULKARNI M，SRIRAMAN A，et al.Deep learning based car damage classification[C]//2017 16th IEEE International Conference on Machine Learning and Applications（ICMLA），2017：50-54.
[2] KARRAS T，LAINE S，AITTALA M，et al.Analyzing and improving the image quality of StyleGAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：8110-8119.
[3] KRAMBERGER T，POTO?NIK B.LSUN-Stanford car dataset：enhancing large-scale car image datasets using deep learning for usage in GAN training[J].Applied Sciences，2020，10（14）：4913.
[4] KARRAS T，LAINE S，AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：4401-4410.
[5] KARRAS T，AITTALA M，LAINE S，et al.Alias-free generative adversarial networks[C]//Advances in Neural Information Processing Systems，2021：852-863.
[6] SAUER A，SCHWARZ K，GEIGER A.StyleGAN-XL：scaling StyleGAN to large diverse datasets[C]//Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings，2022：1-10.
[7] GU Z，LI W，HUO J，et al.LoFGAN：fusing local representations for few-shot image generation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2021：8463-8471.
[8] SAITO K，SAENKO K，LIU M Y.Coco-funit：few-shot unsupervised image translation with a content conditioned style encoder[C]//European Conference on Computer Vision.Cham：Springer，2020：382-398.
[9] KARRAS T，AITTALA M，HELLSTEN J，et al.Training generative adversarial networks with limited data[C]//Advances in Neural Information Processing Systems，2020：12104-12114.
[10] JIANG L，DAI B，WU W，et al.Deceive D：adaptive pseudo augmentation for GAN training with limited data[C]//Advances in Neural Information Processing Systems，2021.
[11] ZHAO S，LIU Z，LIN J，et al.Differentiable augmentation for data-efficient GAN training[C]//Advances in Neural Information Processing Systems，2020：7559-7570.
[12] LIU B，ZHU Y，SONG K，et al.Towards faster and stabilized GAN training for high-fidelity few-shot image synthesis[C]//International Conference on Learning Representations，2020.
[13] ZHAO Z L，SAMEER SINGH，HONGLAK LEE，et al.Improved consistency regularization for GANs[C]//Proceedings of AAAI，2021.
[14] ZHAO Y，DING H，HUANG H，et al.A closer look at few-shot image generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2022：9140-9150.
[15] KUMARI N，ZHANG R，SHECHTMAN E，et al.Ensembling off-the-shelf models for GAN training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2022：10651-10662.
[16] GAL R，PATASHNIK O，MARON H，et al.StyleGAN-NADA：CLIP-guided domain adaptation of image generators[J].ACM Transactions on Graphics（TOG），2022，41（4）：1-13.
[17] ABDALR，QIN Y P，WONKA P.Image2StyleGAN：how to embed images into the StyleGAN latent space?[C]// Proceedings of the IEEE International Conference on Computer Vision，2019：4432-4441.
[18] RICHARDSONE，ALALUF Y，PATASHNIK O，et al.Encoding in Style：a StyleGAN encoder for image-to-image translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2021.
[19] TOV O，ALALUF Y，NITZAN Y，et al.Designing an encoder for StyleGAN image manipulation[J].ACM Transactions on Graphics（TOG），2021，40（4）：1-14.
[20] ALALUF Y，TOV O，MOKADY R，et al.Hyperstyle：StyleGAN inversion with hypernetworks for real image editing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2022：18511-18521.
[21] SHEN Y J，GU J J，TANG X O，et al.Interpreting the latent space of GANs for semantic face editing[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020.
[22] ALI-GOMBE A，ELYAN E，SAVOYE Y，et al.Few-shot classifier GAN[C]//2018 International Joint Conference on Neural Networks（IJCNN），2018.
[23] ALI-GOMBE A，ELYAN E.MFC-GAN：class-imbalanced dataset classification using multiple fake class generative adversarial network[J].Neurocomputing，2019，361：212-221.
[24] GONG A，YAO X J，LIN W，Dermoscopy image classification based on StyleGANs and decision fusion[J].IEEE Access，2020，8：70640-70650.
[25] TAN M，LE Q.Efficientnet：rethinking model scaling for convolutional neural networks[C]//International Conference on Machine Learning，2019：6105-6114.
[26] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.
[27] VAN DEN OORD A，VINYALS O，et al.Neural discrete representation learning[C]//Advances in Neural Information Processing Systems，2017.