Fashion Content and Style Transfer Based on Generative Adversarial Network

doi:10.3778/j.issn.1002-8331.2212-0265

Abstract

Abstract: The generative adversarial network is often used for image conversion tasks such as image coloring, semantic composition, style transfer, etc. However, the training of image generation models at this stage often depends on a large number of paired datasets, and can only achieve the conversion between two image domains. To solve the above problems, a content and style transfer based on generative adversarial network (CS-GAN) is proposed. The model maximizes the mutual information between fashion items and generated images by using a contrastive learning framework, which can ensure that content migration can be achieved without changing the structure of fashion items. Through layer to layer dynamic convolution method, style features are adaptively learned for different style images to achieve arbitrary style migration of fashion items. Content features (such as monet style and cubism) and style features (such as color and texture) of imported fashion items are integrated to achieve conversion of multiple image domains. Comparative experiments and results analysis are conducted on the open fashion data set. Comparative experiments and results analysis are carried out on the public fashion data set. Compared with other mainstream methods, this method has improved in image synthesis quality, average Inception score and FID distance evaluation indicators.

Key words: generative adversarial network, content and style transfer, feature fusion, multi-domain feature transfer, layer consistence dynamic convolution

摘要： 生成对抗网络常常被用于图像着色、语义合成、风格迁移等图像转换任务，但现阶段图像生成模型的训练往往依赖于大量配对的数据集，且只能实现两个图像域之间的转换。针对以上问题，提出了一种基于生成对抗网络的时尚内容和风格迁移模型（content and style transfer based on generative adversarial network，CS-GAN）。该模型利用对比学习框架最大化时尚单品与生成图像之间的互信息，可保证在时尚单品结构不变的前提下实现内容迁移；通过层一致性动态卷积方法，针对不同风格图像自适应地学习风格特征，实现时尚单品任意风格迁移，对输入的时尚单品进行内容特征（如颜色、纹理）和风格特征（如莫奈风、立体派）的融合，实现多个图像域的转换。在公开的时尚数据集上进行对比实验和结果分析，该方法与其他主流方法相比，在图像合成质量、Inception score和FID距离评价指标上均有所提升。

关键词: 生成对抗网络, 内容和风格迁移, 特征融合, 多域转换, 层一致性动态卷积

DING Wenhua, DU Junwei, HOU Lei, LIU Jinhuan. Fashion Content and Style Transfer Based on Generative Adversarial Network[J]. Computer Engineering and Applications, 2024, 60(9): 261-271.

丁文华, 杜军威, 侯磊, 刘金环. 基于生成对抗网络的时尚内容和风格迁移[J]. 计算机工程与应用, 2024, 60(9): 261-271.

References

[1] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
[2] 程显毅, 谢璐, 朱建新, 等. 生成对抗网络GAN综述[J]. 计算机科学, 2019, 46(3): 74-81.
CHENG X Y, XIE L, ZHU J X, et al. Review of generative adversarial network[J]. Computer Science, 2019, 46(3): 74-81.
[3] ZHANG R, ISOLA P, EFROS A A. Colorful image colorization[C]//Proceedings of the European Conference on Computer Vision, 2016: 649-666.
[4] CHENG Z, YANG Q, SHENG B. Deep colorization[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 415-423.
[5] ZHU P, ABDAL R, QIN Y, et al. SEAN: image synthesis with semantic region-adaptive normalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 5104-5113.
[6] LI X, ZHANG W, PANG J, et al. Video k-net: a simple, strong, and unified baseline for video segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 18847-18857.
[7] ZENG Y, YANG H, CHAO H, et al. Improving visual quality of image synthesis by a token-based generator with transformers[C]//Advances in Neural Information Processing Systems, 2021: 21125-21137.
[8] LI W, XIONG W, LIAO H, et al. CariGAN: caricature generation through weakly paired adversarial learning[J]. Neural Networks, 2020, 132: 66-74.
[9] DALVA Y, ALTINDI? S F, DUNDAR A. VecGAN: image-to-image translation with interpretable latent directions[C]//European Conference on Computer Vision. Cham: Springer, 2022: 153-169.
[10] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4401-4410.
[11] ZHANG Y, LI M, LI R, et al. Exact feature distribution matching for arbitrary style transfer and domain generalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 8035-8045.
[12] LI Z, WANG C, ZHENG H, et al. FakeCLR: exploring contrastive learning for solving latent discontinuity in data-efficient GANs[C]//European Conference on Computer Vision. Cham: Springer, 2022: 598-615.
[13] LEDIG C, THEIS L, HUSZR F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 105-114.
[14] DONG C, CHEN C L, HE K, et al. Image superresolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2): 295-307.
[15] MIRZA M, OSINDERO S. Conditional generative adversarial nets[J]. arXiv:1411.1784, 2014.
[16] ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5967-5976.
[17] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2223-2232.
[18] AMODIO M, KRISHNASWAMY S. Travelgan: image-to-image translation by transformation vector learning[C]//Proceedings of the IEEE/CVF Conference on Computer vision and Pattern Recognition, 2019: 8983-8992.
[19] BENAIM S, WOLF L. One-sided unsupervised domain mapping[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 752-762.
[20] FU H, GONG M, WANG C, et al. Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2427-2436.
[21] GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2414-2423.
[22] ULYANOV D, VEDALDI A, LEMPITSKY V. Instance normalization: the missing ingredient for fast stylization[J]. arXiv:1607.08022, 2016.
[23] DUMOULIN V, SHLENS J, KUDLUR M. A learned represent-ation for artistic style[J]. arXiv:1610.07629, 2016.
[24] HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 1501-1510.
[25] LASSNER C, PONS-MOLL G, GEHLER P V. A generative model of people in clothing[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 853-862.
[26] JETCHEV N, BERGMANN U. The conditional analogy GAN: swapping fashion articles on people images[C]//IEEE International Conference on Computer Vision Workshops (ICCVW), Venice， Italy, 2017: 2287-2292.
[27] HAN X T, WU Z X, WU Z, et al. Viton: an image-based virtual try-on network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7543-7552.
[28] SBAI O, ELHOSEINY M, BORDES A, et al. Design: design inspiration from generative networks[C]//Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018.
[29] MO S, CHO M, SHIN J. Instagan: instance-aware image-to-image translation[J]. arXiv:1812.10889, 2018.
[30] XIAN W, SANGKLOY P, AGRAWAL V, et al. Texturegan: controlling deep image synthesis with texture patches[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8456-8465.
[31] AK K E, LIM J H, THAM J Y, et al. Attribute manipulation generative adversarial networks for fashion images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 10541-10550.
[32] ZHU S, URTASUN R, FIDLER S, et al. Be your own prada: fashion synthesis with structural coherence[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 1680-1688.
[33] ZHANG Y, LI L, SONG L, et al. FACT: fused attention for clothing transfer with generative adversarial networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 12894-12901.
[34] YOO D, KIM N, PARK S, et al. Pixel-level domain transfer[C]//Proceedings of the European Conference on Computer Vision, 2016: 517-532.
[35] GOKASLAN A, RAMANUJAN V, RITCHIE D, et al. Improving shape deformation in unsupervised image-to-image translation[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 649-665.
[36] XU Y, YIN Y, JIANG L, et al. TransEditor: transformer-based dual-space GAN for highly controllable facial editing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 7683-7692.
[37] WANG T, ZHANG Y, FAN Y, et al. High-fidelity gan inversion for image attribute editing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 11379-11388.
[38] KIM J, CHOI Y, UH Y. Feature statistics mixing regularization for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 11294-11303.
[39] YANG S, HWANG H, YE J C. Zero-shot contrastive loss for text-guided diffusion image style transfer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 22873-22882.
[40] LI B, ZHU Y, WANG Y, et al. AniGAN: style-guided generative adversarial networks for unsupervised anime face generation[J]. IEEE Transactions on Multimedia, 2021, 24: 4077-4091.
[41] CHEN Y, LAI Y K, LIU Y J. CartoonGAN: generative adversarial networks for photo cartoonization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 9465-9474.
[42] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[43] ZHAO Y, ZHANG X, FENG W, et al. Deep learning classification by ResNet-18 based on the real spectral dataset from multispectral remote sensing images[J]. Remote Sensing, 2022, 14(19): 4883.
[44] JOHNSON J, ALAHI A, LI F F. Perceptual losses for real-time style transfer and super-resolution[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, October 11-14, 2016: 694-711.
[45] OORD A, LI Y, VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv:1807.03748, 2018.
[46] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//International Conference on Machine Learning, 2020: 1597-1607.
[47] SONG X, FENG F, LIU J, et al. Neurostylist: neural compatibility modeling for clothing matching[C]//Proceedings of the 25th ACM International Conference on Multimedia, 2017: 753-761.
[48] NICHOL K. Painter by numbers, WIKI ART[Z]. Kiri Nichol, 2016.
[49] XU Q, HUANG G, YUAN Y, et al. An empirical study on evaluation metrics of generative adversarial networks[J]. arXiv:1806.07755, 2018.
[50] SETIADI D R I M. PSNR vs SSIM: imperceptibility quality assessment for image steganography[J]. Multimedia Tools and Applications, 2021, 80(6): 8423-8444.
[51] PARK T, EFROS A A, ZHANG R, et al. Contrastive learning for unpaired image-to-image translation[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, August 23-28, 2020. [S.l.]: Springer International Publishing, 2020: 319-345.
[52] JING Y, LIU X, DING Y, et al. Dynamic instance normalization for arbitrary style transfer[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 4369-4376.