Research Progress of Image Style Transfer Based on Neural Network

doi:10.3778/j.issn.1002-8331.2309-0204

Abstract

Abstract: Image style transfer is the process of remapping the content of a specified image with a style image, which is a research hotspot in the field of artificial intelligence computer vision. Traditional image style transfer methods are mainly based on the synthesis of physical and texture techniques, and the style transfer effect is rough and less robust. With the emergence of image datasets and the proposal of various deep learning model networks, many models and algorithms for image style transfer have emerged. This paper analyzes the current status of image style transfer research, combs the development of image style transfer and the latest research progress, and gives the future research directions of image style transfer through comparative analysis.

Key words: image style transfer, deep learning, convolutional neural network, attention mechanism

摘要： 图像风格迁移是用风格图像对指定图像的内容进行重映射的过程，是人工智能计算机视觉领域中的一个研究热点。传统的图像风格迁移方法主要基于物理、纹理技术的合成，风格迁移效果较为粗糙并且鲁棒性较差，随着图像数据集的出现和各种深度学习模型网络的提出，涌现了许多图像风格迁移的模型和算法。通过对图像风格迁移研究现状的分析，梳理了图像风格迁移的发展脉络和最新的研究进展，并通过对比分析给出了图像风格迁移未来的研究方向。

关键词: 图像风格迁移, 深度学习, 卷积神经网络, 注意力机制

LIAN Lu, TIAN Qichuan, TAN Run, ZHANG Xiaohang. Research Progress of Image Style Transfer Based on Neural Network[J]. Computer Engineering and Applications, 2024, 60(9): 30-47.

廉露, 田启川, 谭润, 张晓行. 基于神经网络的图像风格迁移研究进展[J]. 计算机工程与应用, 2024, 60(9): 30-47.

References

[1] 唐稔为, 刘启和, 谭浩. 神经风格迁移模型综述[J]. 计算机工程与应用, 2021, 57(19): 32-43.
TANG R W, LIU Q H, TAN H. Review of neural style transfer models[J]. Computer Engineering and Applications, 2021, 57(19): 32-43.
[2] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527-1554.
[3] 高强. 基于深度卷积网络学习算法及其应用研究[D]. 北京: 北京化工大学, 2014.
GAO Q. Learning algorithm and application research based on the deep convolutional neural network[D]. Beijing: Beijing University of Chemical Technology, 2014.
[4] 田启川, 王满丽. 深度学习算法研究进展[J]. 计算机工程与应用, 2019, 55(22): 25-33.
TIAN Q C, WANG M L. Research progress on deep learning algorithms[J]. Computer Engineering and Applications, 2019, 55(22): 25-33.
[5] HAEBERLI P. Paint by numbers: abstract image represen-tations[C]//Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques, 1990.
[6] HERTZMANN A, JACOBS C E, OLIVER N, et al. Image analogies[C]//Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM Press, 2001: 327-340.
[7] TOMASI C, MANDUCHI R. Bilateral filtering for gray andcolor images[C]//Proceedings of the IEEE International Conference on Computer Vision, 1998: 839-846.
[8] EFROS A A, LEUNG T K. Texture synthesis by non-parametric sampling[C]//Proceedings of the 7th IEEE International Conference on Computer Vision. Washington DC: IEEE Computer Society, 1999: 1033-1038.
[9] EFROS A A, FREEMAN W T. Image quilting for texture synthesis and transfer[C]//Proceedings of SIGGRAPH 2001, 2001: 253-258.
[10] GATYS L A, ECKER A S, BETHGE M. A neural algorithm of artistic style[J]. arXiv:1508.06576, 2015.
[11] GATYS L A, ECKER A S, BETHGE M. Texture synthesis using convolutional neural networks[J]. arXiv:1505.07376, 2015.
[12] GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2414-2423.
[13] LI Y H, WANG N Y, LIU J Y, et al. Demystifying neural style transfer[J]. arXiv:1701.01036, 2017.
[14] LI C, WAND M. Combining markov random fields and convolutional neural networks for imagesynthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2479-2486.
[15] 曾宪华, 陆宇喆, 童世玥, 等. 结合马尔科夫场和格拉姆矩阵特征的写实类图像风格迁移[J]. 南京大学学报 (自然科学), 2021, 57(1): 1-9.
ZENG X H, LU Y Z, TONG S Y, et al. Photorealism style transfer combining MRFs-based and gram-based features[J]. Journal of Nanjing University (Natural Science), 2021, 57(1): 1-9.
[16] LIAO J, YAO Y, YUAN L, et al. Visual attribute transfer through deep image analogy[J]. ACM Trans on Graphics, 2017, 36(4): 1-15.
[17] KOLKIN N, SALAVON J, SHAKHNAROVICH G. Style transfer by relaxed optimal transport and self-similarity[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 10051-10060.
[18] JUSTIN J, ALEXANDRE A, LI F F. Perceptual losses for real-time style transfer and super-resolution[C]//Proceedings of the European Conference on Computer Vision, 2016: 694-711.
[19] ULYANOV D, LEBEDEV V, VEDALDI A, et al. Texture networks: feed-forward synthesis of textures and stylized images[C]//Proceedings of the International Conference on Machine Learning (ICML), 2016: 1349-1357.
[20] WANG X, OXHOLM G, ZHANG D, et al. Multimodal transfer: a hierarchical deep convolutional neural network for fast artistic style transfer[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington DC: IEEE Computer Society, 2017: 7178-7186.
[21] ULYANOV D, LEBEDEV V, VEDALDI A, et al. Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4105-4113.
[22] MIRZA M, OSINDERO S. Conditional generative adver-sarial nets[J]. arXiv:1411.1784, 2014.
[23] RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv:1511.06434, 2015.
[24] LI C, WAND M. Precomputed real-time texture synthesis with markovian generative adversarial networks[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2016: 702-716.
[25] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017: 2242-2251.
[26] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recongnition, 2019: 4401-4410.
[27] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recongnition, 2020: 8110-8119.
[28] KARRAS T, AITTALA M, LAINE S, et al. Alias-free generative adversarial networks[C]//Advances in Neural Information Processing Systems, 2021: 852-863.
[29] 毛文涛, 吴桂芳, 吴超, 等. 基于中国写意风格迁移的动漫视频生成模型[J]. 计算机应用, 2022, 42(7): 2162-2169.
MAO W T, WU G L, WU C, et al. Animation video generation model based on chinese impressionistic style transfer[J]. Journal of Computer Applications, 2022, 42(7): 2162-2169.
[30] 孙天鹏, 周宁宁, 黄国方. 新的基于GAN的局部写实感漫画图像风格迁移[J]. 计算机工程与应用, 2022, 58(14): 167-176.
SUN T P, ZHOU N N, HUANG G F. New GAN-based partial realistic anime image style transfer[J]. Computer Engineering and Applications, 2022, 58(14): 167-176.
[31] DUMOULIN V, SHLENS J, KUDLUR M. A learned representation for artistic style[J]. arXiv:1610.07629, 2016.
[32] LI Y, FANG C, YANG J, et al. Diversified texture synthesis with feed-forward networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 3920-3928.
[33] CHEN D, YUAN L, LIAO J, YU N, HUA G. StyleBank: an explicit representation for neural image style transfer[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 1897-1906.
[34] ZHANG H, DANA K. Multi-style generative network for real-time transfer[J]. arXiv:1703.06953, 2017.
[35] 乔平安, 李静文, 曹家亮. 多通道CartoonGAN下的图像风格动漫化[J]. 计算机应用研究, 2021, 38(11): 3517-3520.
QIAO P A, LI J W, CAO J L. Animation of image style in multi-channel CartoonGAN[J]. Application Research of Computers, 2021, 38(11): 3517-3520.
[36] CHUNG C Y, HUANG S H. Interactively transforming chinese ink paintings into realistic images using a border enhance generative adversarial network[J]. Multimed Tools Applications, 2023, 82: 11663-11696.
[37] WANG W, LI Y, YE H, et al. Ink painting style transfer using asymmetric cycle-consistent GAN[J]. Engineering Applicat-ions of Artificial Intelligence, 2023, 126: 107067.
[38] CHEN T Q, SCHMIDT M. Fast patch-based style transfer of arbitrary style[C]//Proceedings of the NIPS Workshop on Constructive Machine Learning, 2016.
[39] HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 1510-1519.
[40] LI Y, FANG C, YANG J, et al. Universal style transfer via feature transforms[C]//Advances in Neural Information Processing Systems, 2017: 386-396.
[41] PARK D Y, LEE K H. Arbitrary style transfer with style-attentional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5880-5888.
[42] YAO Y, REN J, XIE X, et al. Attention-aware multi-stroke style transfer[J]. arXiv:1901.05127, 2019.
[43] LIU S, LIN T, HE D, et al. AdaAttN: revisit attention mechanism in arbitrary neural style transfer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 6649-6658.
[44] LUO X, HAN Z, YANG L, et al. Consistent style transfer [J]. arXiv:2201.02233, 2022.
[45] CHO W, CHOI S, PARK D K, et al. Image-to-image translation via group-wise deep whitening-and-coloring transformation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 10631-10639.
[46] XU Z, WILBER M, FANG C, et al. Adversarial training for fast arbitrary style transfer[J]. Computers & Graphics, 2020, 87: 1-11.
[47] HUO J, JIN S, LI W, et al. Manifold alignment for semantically aligned style transfer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 14861-14869.
[48] 朱仲贤, 毛语实, 蔡科伟, 等. 面向工业巡检的图像风格迁移方法[J]. 计算机工程与应用, 2023, 59(18): 234-241.
ZHU Z X, MAO Y S, CAI K W, et al. Image style transfer method for industrial inspection[J]. Computer Engineering and Applications, 2023, 59(18): 234-241.
[49] ZHANG Z, SUN J, CHEN J, et al. Caster: cartoon style transfervia dynamic cartoon style casting[J]. Neurocomputing, 2023, 556: 126654.
[50] YU Y, LI D, LI B, et al. Multi-style image generation based on semantic image[J/OL]. Visual Computer[2023-08-15]. https://doi.org/10.1007/s00371-023-03042-2.
[51] 李本佳. 非真实感绘制技术的发展综述[J]. 电脑知识与技术, 2018, 14(35): 188-190.
LI B J. Review of the development of non-photorealistic rendering techniques[J]. Computer Knowledge and Techno-logy, 2018, 14(35): 188-190.
[52] GOOCH B, GOOCH A. Non-photorealistic rendering[M]. Natick, MA, USA: A K Peters, Ltd , 2001.
[53] STROTHOTTE T, SCHLECHTWEG S. Non-photorealistic computer graphics: modeling, rendering, and animation[M]. San Francisco, CA: Morgan Kaufmann, 2002.
[54] ROSIN P, COLLOMOSSE J. Image and video-based artistic stylisation[M]//Computational imaging and vision. Berlin: Springer Publishing Company, 2013.
[55] 陈存健. 基于神经网络的中国绘画图像风格迁移[D]. 杭州: 杭州电子科技大学, 2020.
CHEN C J. Chinese painting style transfer based on con-volutional neural network[D]. Hangzhou: Hangzhou Dianzi University, 2020.
[56] 李慧, 万晓霞. 深度卷积神经网络下的图像风格迁移算法[J]. 计算机工程与应用, 2020, 56(2): 176-183.
LI H, WAN X X. Image style transfer algorithm under deep convolutional neural network[J]. Computer Engineering and Applications, 2020, 56(2): 176-183.
[57] 张远达. 线性代数原理[M]. 上海: 上海教育出版社, 1980.
ZHANG Y D. Principles of linear algebra[M]. Shanghai: Shanghai Educational Publishing House, 1980.
[58] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014.
[59] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[60] HINTON G, SRIVASTAVA N, SWERSKY K. RMSPROP: divide the gradient by a running average of its recent magnitude[J]. Neural Networks for Machine Learning, 2012, 4: 26-31.
[61] RUBNER Y. The earth movers’s distance as a metric for image retrieval[J]. International Journal of Computer Vision, 2000, 40(2): 99-121.
[62] KUSNER M, SUN Y, KOLKIN N. WEINBERGER K. From word embeddings to document distances[C]//Proceedings of the International Conference on Machine Learning, 2015: 957-966.
[63] HE K, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington DC: IEEE Computer Society, 2016: 770-778.
[64] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the European Conference on Computer Vision, 2014: 740-755.
[65] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning, 2015: 448-456.
[66] GOODFELLOW I, POUGET A J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 2672-2680.
[67] HAN J, SHOEIBY M, PETERSSON L, et al. Dual contrastive learning for unsupervised image-to-image translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 746-755.
[68] Prisma Labs, Inc. Prisma: turn memories into art using artificial intelligence[EB/OL]. (2016) [2023-08-15]. http://prismaai.com.
[69] Turn your photos into art: repaint your picture in the style of your favorite artist[EB/OL]. (2018-11-11) [2023-08-15]. http://deepart.io.
[70] 吴广, 王元浩. 基于深度风格迁移网络的文物数字拓片生成技术[J]. 科技创新与应用, 2023, 13(14): 36-39.
WU G, WANG Y H. A digital rubbing generation method based on depth style migration network[J]. Technology Innovation and Application, 2023, 13(14): 36-39.
[71] 孙鹏, 童世博. 面向图像与视频的AI篡改技术综述[J]. 中国刑警学院学报, 2022(4): 118-128.
SUN P, TONG S B. A survey of ai tampering technology for images and videos[J]. Journal of Criminal Investigation Police University of China, 2022(4): 118-128.
[72] 蒋泽宇, 韩荣, 刘晓鸿, 等. 基于深度学习的医学影像高效生成方法研究[J]. 医疗卫生装备, 2023, 44(2): 1-4.
JIANG Z Y, HAN R, LIU X H, et al. Research on medical image generation method based on deep learning[J]. Chinese Medical Equipment Journal, 2023, 44(2): 1-4.
[73] 卞殷旭, 邢涛, 邓伟杰, 等. 基于深度学习的色彩迁移生物医学成像技术[J]. 红外与激光工程, 2022, 51(2): 339-356.
BIAN Y X, XING T, DENG W J, et al. Deep learning-based color transfer biomedical imaging technology[J]. Infrared and Laser Engineering, 2022, 51(2): 339-356.
[74] 张如涛, 黄山, 汪鸿浩. 基于改进CycleGAN的道路场景语义分割研究[J]. 计算机工程与应用, 2022, 58(15): 278-284.
ZHANG R T, HUANG S, WANG H H. Research on road scene semantic segmentation based on improved cyclegan[J]. Computer Engineering and Applications, 2022, 58(15): 278-284.
[75] 李轩, 王飞跃. 面向智能驾驶的平行视觉感知: 基本概念、框架与应用[J]. 中国图象图形学报, 2021, 26(1): 67-81.
LI X, WANG F Y. Parallel visual perception for intelligent driving: basic concept, framework and application[J]. Journal of Image and Graphics, 2021, 26(1): 67-81.
[76] PHILLIPS F, MACKINTOSH B. Wiki art gallery, inc: a case for critical thinking[J]. Issues in Accounting Education, 2011, 26(3): 593-608.
[77] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248-255.
[78] YANN L C. Université de Montréal, MNIST hand-written digit database[DB/OL]. (2010) [2023-08-15]. http://yann.lecun.com/exdb/mnist/.
[79] THOMEE B, ELIZALDE B, SHAMMA D A, et al. YFCC100M: the new data in multimedia research[J]. Communications of the ACM, 2016, 59(2): 64-73.
[80] HUISKES M J, LEW M S. The MIR flickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, 2008: 39-43.
[81] YU F, ZHANG Y, SONG S, et al. LSUN: construction of a large-scale image dataset using deep learning with humans in the loop[J]. arXiv:1506.03365, 2015.
[82] ALEXANDRE A, MONET C. Claude Monet[M]. [S.l.]: Nabu Press, 2010.
[83] CIMPOI M, MAJI S, KOKKINOS I, et al. Describing textures in the wild[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 3606-3613.
[84] LIU Z, LUO P, WANG X, et al. Deep learning face attributes in the wild[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015: 3730-3738.
[85] WILBER M J, FANG C, et al. BAM! The behance artistic media dataset for recognition beyond photography[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 1211-1220.
[86] WINDER S, BROWN M. Learning local image descriptors[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2007: 1-8.
[87] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[88] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6629-6640.
[89] WANG Z Z, ZHAO L, CHEN H B, et al. Evaluate and improve the quality of neural style transfer[J]. Computer Vision and Image Understanding, 2021, 207: 103203.