计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (9): 30-47.DOI: 10.3778/j.issn.1002-8331.2309-0204
廉露,田启川,谭润,张晓行
出版日期:
2024-05-01
发布日期:
2024-04-29
LIAN Lu, TIAN Qichuan, TAN Run, ZHANG Xiaohang
Online:
2024-05-01
Published:
2024-04-29
摘要: 图像风格迁移是用风格图像对指定图像的内容进行重映射的过程,是人工智能计算机视觉领域中的一个研究热点。传统的图像风格迁移方法主要基于物理、纹理技术的合成,风格迁移效果较为粗糙并且鲁棒性较差,随着图像数据集的出现和各种深度学习模型网络的提出,涌现了许多图像风格迁移的模型和算法。通过对图像风格迁移研究现状的分析,梳理了图像风格迁移的发展脉络和最新的研究进展,并通过对比分析给出了图像风格迁移未来的研究方向。
廉露, 田启川, 谭润, 张晓行. 基于神经网络的图像风格迁移研究进展[J]. 计算机工程与应用, 2024, 60(9): 30-47.
LIAN Lu, TIAN Qichuan, TAN Run, ZHANG Xiaohang. Research Progress of Image Style Transfer Based on Neural Network[J]. Computer Engineering and Applications, 2024, 60(9): 30-47.
[1] 唐稔为, 刘启和, 谭浩. 神经风格迁移模型综述[J]. 计算机工程与应用, 2021, 57(19): 32-43. TANG R W, LIU Q H, TAN H. Review of neural style transfer models[J]. Computer Engineering and Applications, 2021, 57(19): 32-43. [2] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527-1554. [3] 高强. 基于深度卷积网络学习算法及其应用研究[D]. 北京: 北京化工大学, 2014. GAO Q. Learning algorithm and application research based on the deep convolutional neural network[D]. Beijing: Beijing University of Chemical Technology, 2014. [4] 田启川, 王满丽. 深度学习算法研究进展[J]. 计算机工程与应用, 2019, 55(22): 25-33. TIAN Q C, WANG M L. Research progress on deep learning algorithms[J]. Computer Engineering and Applications, 2019, 55(22): 25-33. [5] HAEBERLI P. Paint by numbers: abstract image represen-tations[C]//Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques, 1990. [6] HERTZMANN A, JACOBS C E, OLIVER N, et al. Image analogies[C]//Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM Press, 2001: 327-340. [7] TOMASI C, MANDUCHI R. Bilateral filtering for gray andcolor images[C]//Proceedings of the IEEE International Conference on Computer Vision, 1998: 839-846. [8] EFROS A A, LEUNG T K. Texture synthesis by non-parametric sampling[C]//Proceedings of the 7th IEEE International Conference on Computer Vision. Washington DC: IEEE Computer Society, 1999: 1033-1038. [9] EFROS A A, FREEMAN W T. Image quilting for texture synthesis and transfer[C]//Proceedings of SIGGRAPH 2001, 2001: 253-258. [10] GATYS L A, ECKER A S, BETHGE M. A neural algorithm of artistic style[J]. arXiv:1508.06576, 2015. [11] GATYS L A, ECKER A S, BETHGE M. Texture synthesis using convolutional neural networks[J]. arXiv:1505.07376, 2015. [12] GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2414-2423. [13] LI Y H, WANG N Y, LIU J Y, et al. Demystifying neural style transfer[J]. arXiv:1701.01036, 2017. [14] LI C, WAND M. Combining markov random fields and convolutional neural networks for imagesynthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2479-2486. [15] 曾宪华, 陆宇喆, 童世玥, 等. 结合马尔科夫场和格拉姆矩阵特征的写实类图像风格迁移[J]. 南京大学学报 (自然科学), 2021, 57(1): 1-9. ZENG X H, LU Y Z, TONG S Y, et al. Photorealism style transfer combining MRFs-based and gram-based features[J]. Journal of Nanjing University (Natural Science), 2021, 57(1): 1-9. [16] LIAO J, YAO Y, YUAN L, et al. Visual attribute transfer through deep image analogy[J]. ACM Trans on Graphics, 2017, 36(4): 1-15. [17] KOLKIN N, SALAVON J, SHAKHNAROVICH G. Style transfer by relaxed optimal transport and self-similarity[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 10051-10060. [18] JUSTIN J, ALEXANDRE A, LI F F. Perceptual losses for real-time style transfer and super-resolution[C]//Proceedings of the European Conference on Computer Vision, 2016: 694-711. [19] ULYANOV D, LEBEDEV V, VEDALDI A, et al. Texture networks: feed-forward synthesis of textures and stylized images[C]//Proceedings of the International Conference on Machine Learning (ICML), 2016: 1349-1357. [20] WANG X, OXHOLM G, ZHANG D, et al. Multimodal transfer: a hierarchical deep convolutional neural network for fast artistic style transfer[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington DC: IEEE Computer Society, 2017: 7178-7186. [21] ULYANOV D, LEBEDEV V, VEDALDI A, et al. Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4105-4113. [22] MIRZA M, OSINDERO S. Conditional generative adver-sarial nets[J]. arXiv:1411.1784, 2014. [23] RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv:1511.06434, 2015. [24] LI C, WAND M. Precomputed real-time texture synthesis with markovian generative adversarial networks[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2016: 702-716. [25] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017: 2242-2251. [26] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recongnition, 2019: 4401-4410. [27] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recongnition, 2020: 8110-8119. [28] KARRAS T, AITTALA M, LAINE S, et al. Alias-free generative adversarial networks[C]//Advances in Neural Information Processing Systems, 2021: 852-863. [29] 毛文涛, 吴桂芳, 吴超, 等. 基于中国写意风格迁移的动漫视频生成模型[J]. 计算机应用, 2022, 42(7): 2162-2169. MAO W T, WU G L, WU C, et al. Animation video generation model based on chinese impressionistic style transfer[J]. Journal of Computer Applications, 2022, 42(7): 2162-2169. [30] 孙天鹏, 周宁宁, 黄国方. 新的基于GAN的局部写实感漫画图像风格迁移[J]. 计算机工程与应用, 2022, 58(14): 167-176. SUN T P, ZHOU N N, HUANG G F. New GAN-based partial realistic anime image style transfer[J]. Computer Engineering and Applications, 2022, 58(14): 167-176. [31] DUMOULIN V, SHLENS J, KUDLUR M. A learned representation for artistic style[J]. arXiv:1610.07629, 2016. [32] LI Y, FANG C, YANG J, et al. Diversified texture synthesis with feed-forward networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 3920-3928. [33] CHEN D, YUAN L, LIAO J, YU N, HUA G. StyleBank: an explicit representation for neural image style transfer[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 1897-1906. [34] ZHANG H, DANA K. Multi-style generative network for real-time transfer[J]. arXiv:1703.06953, 2017. [35] 乔平安, 李静文, 曹家亮. 多通道CartoonGAN下的图像风格动漫化[J]. 计算机应用研究, 2021, 38(11): 3517-3520. QIAO P A, LI J W, CAO J L. Animation of image style in multi-channel CartoonGAN[J]. Application Research of Computers, 2021, 38(11): 3517-3520. [36] CHUNG C Y, HUANG S H. Interactively transforming chinese ink paintings into realistic images using a border enhance generative adversarial network[J]. Multimed Tools Applications, 2023, 82: 11663-11696. [37] WANG W, LI Y, YE H, et al. Ink painting style transfer using asymmetric cycle-consistent GAN[J]. Engineering Applicat-ions of Artificial Intelligence, 2023, 126: 107067. [38] CHEN T Q, SCHMIDT M. Fast patch-based style transfer of arbitrary style[C]//Proceedings of the NIPS Workshop on Constructive Machine Learning, 2016. [39] HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 1510-1519. [40] LI Y, FANG C, YANG J, et al. Universal style transfer via feature transforms[C]//Advances in Neural Information Processing Systems, 2017: 386-396. [41] PARK D Y, LEE K H. Arbitrary style transfer with style-attentional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5880-5888. [42] YAO Y, REN J, XIE X, et al. Attention-aware multi-stroke style transfer[J]. arXiv:1901.05127, 2019. [43] LIU S, LIN T, HE D, et al. AdaAttN: revisit attention mechanism in arbitrary neural style transfer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 6649-6658. [44] LUO X, HAN Z, YANG L, et al. Consistent style transfer [J]. arXiv:2201.02233, 2022. [45] CHO W, CHOI S, PARK D K, et al. Image-to-image translation via group-wise deep whitening-and-coloring transformation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 10631-10639. [46] XU Z, WILBER M, FANG C, et al. Adversarial training for fast arbitrary style transfer[J]. Computers & Graphics, 2020, 87: 1-11. [47] HUO J, JIN S, LI W, et al. Manifold alignment for semantically aligned style transfer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 14861-14869. [48] 朱仲贤, 毛语实, 蔡科伟, 等. 面向工业巡检的图像风格迁移方法[J]. 计算机工程与应用, 2023, 59(18): 234-241. ZHU Z X, MAO Y S, CAI K W, et al. Image style transfer method for industrial inspection[J]. Computer Engineering and Applications, 2023, 59(18): 234-241. [49] ZHANG Z, SUN J, CHEN J, et al. Caster: cartoon style transfervia dynamic cartoon style casting[J]. Neurocomputing, 2023, 556: 126654. [50] YU Y, LI D, LI B, et al. Multi-style image generation based on semantic image[J/OL]. Visual Computer[2023-08-15]. https://doi.org/10.1007/s00371-023-03042-2. [51] 李本佳. 非真实感绘制技术的发展综述[J]. 电脑知识与技术, 2018, 14(35): 188-190. LI B J. Review of the development of non-photorealistic rendering techniques[J]. Computer Knowledge and Techno-logy, 2018, 14(35): 188-190. [52] GOOCH B, GOOCH A. Non-photorealistic rendering[M]. Natick, MA, USA: A K Peters, Ltd , 2001. [53] STROTHOTTE T, SCHLECHTWEG S. Non-photorealistic computer graphics: modeling, rendering, and animation[M]. San Francisco, CA: Morgan Kaufmann, 2002. [54] ROSIN P, COLLOMOSSE J. Image and video-based artistic stylisation[M]//Computational imaging and vision. Berlin: Springer Publishing Company, 2013. [55] 陈存健. 基于神经网络的中国绘画图像风格迁移[D]. 杭州: 杭州电子科技大学, 2020. CHEN C J. Chinese painting style transfer based on con-volutional neural network[D]. Hangzhou: Hangzhou Dianzi University, 2020. [56] 李慧, 万晓霞. 深度卷积神经网络下的图像风格迁移算法[J]. 计算机工程与应用, 2020, 56(2): 176-183. LI H, WAN X X. Image style transfer algorithm under deep convolutional neural network[J]. Computer Engineering and Applications, 2020, 56(2): 176-183. [57] 张远达. 线性代数原理[M]. 上海: 上海教育出版社, 1980. ZHANG Y D. Principles of linear algebra[M]. Shanghai: Shanghai Educational Publishing House, 1980. [58] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014. [59] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. [60] HINTON G, SRIVASTAVA N, SWERSKY K. RMSPROP: divide the gradient by a running average of its recent magnitude[J]. Neural Networks for Machine Learning, 2012, 4: 26-31. [61] RUBNER Y. The earth movers’s distance as a metric for image retrieval[J]. International Journal of Computer Vision, 2000, 40(2): 99-121. [62] KUSNER M, SUN Y, KOLKIN N. WEINBERGER K. From word embeddings to document distances[C]//Proceedings of the International Conference on Machine Learning, 2015: 957-966. [63] HE K, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington DC: IEEE Computer Society, 2016: 770-778. [64] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the European Conference on Computer Vision, 2014: 740-755. [65] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning, 2015: 448-456. [66] GOODFELLOW I, POUGET A J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 2672-2680. [67] HAN J, SHOEIBY M, PETERSSON L, et al. Dual contrastive learning for unsupervised image-to-image translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 746-755. [68] Prisma Labs, Inc. Prisma: turn memories into art using artificial intelligence[EB/OL]. (2016) [2023-08-15]. http://prismaai.com. [69] Turn your photos into art: repaint your picture in the style of your favorite artist[EB/OL]. (2018-11-11) [2023-08-15]. http://deepart.io. [70] 吴广, 王元浩. 基于深度风格迁移网络的文物数字拓片生成技术[J]. 科技创新与应用, 2023, 13(14): 36-39. WU G, WANG Y H. A digital rubbing generation method based on depth style migration network[J]. Technology Innovation and Application, 2023, 13(14): 36-39. [71] 孙鹏, 童世博. 面向图像与视频的AI篡改技术综述[J]. 中国刑警学院学报, 2022(4): 118-128. SUN P, TONG S B. A survey of ai tampering technology for images and videos[J]. Journal of Criminal Investigation Police University of China, 2022(4): 118-128. [72] 蒋泽宇, 韩荣, 刘晓鸿, 等. 基于深度学习的医学影像高效生成方法研究[J]. 医疗卫生装备, 2023, 44(2): 1-4. JIANG Z Y, HAN R, LIU X H, et al. Research on medical image generation method based on deep learning[J]. Chinese Medical Equipment Journal, 2023, 44(2): 1-4. [73] 卞殷旭, 邢涛, 邓伟杰, 等. 基于深度学习的色彩迁移生物医学成像技术[J]. 红外与激光工程, 2022, 51(2): 339-356. BIAN Y X, XING T, DENG W J, et al. Deep learning-based color transfer biomedical imaging technology[J]. Infrared and Laser Engineering, 2022, 51(2): 339-356. [74] 张如涛, 黄山, 汪鸿浩. 基于改进CycleGAN的道路场景语义分割研究[J]. 计算机工程与应用, 2022, 58(15): 278-284. ZHANG R T, HUANG S, WANG H H. Research on road scene semantic segmentation based on improved cyclegan[J]. Computer Engineering and Applications, 2022, 58(15): 278-284. [75] 李轩, 王飞跃. 面向智能驾驶的平行视觉感知: 基本概念、框架与应用[J]. 中国图象图形学报, 2021, 26(1): 67-81. LI X, WANG F Y. Parallel visual perception for intelligent driving: basic concept, framework and application[J]. Journal of Image and Graphics, 2021, 26(1): 67-81. [76] PHILLIPS F, MACKINTOSH B. Wiki art gallery, inc: a case for critical thinking[J]. Issues in Accounting Education, 2011, 26(3): 593-608. [77] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248-255. [78] YANN L C. Université de Montréal, MNIST hand-written digit database[DB/OL]. (2010) [2023-08-15]. http://yann.lecun.com/exdb/mnist/. [79] THOMEE B, ELIZALDE B, SHAMMA D A, et al. YFCC100M: the new data in multimedia research[J]. Communications of the ACM, 2016, 59(2): 64-73. [80] HUISKES M J, LEW M S. The MIR flickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, 2008: 39-43. [81] YU F, ZHANG Y, SONG S, et al. LSUN: construction of a large-scale image dataset using deep learning with humans in the loop[J]. arXiv:1506.03365, 2015. [82] ALEXANDRE A, MONET C. Claude Monet[M]. [S.l.]: Nabu Press, 2010. [83] CIMPOI M, MAJI S, KOKKINOS I, et al. Describing textures in the wild[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 3606-3613. [84] LIU Z, LUO P, WANG X, et al. Deep learning face attributes in the wild[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015: 3730-3738. [85] WILBER M J, FANG C, et al. BAM! The behance artistic media dataset for recognition beyond photography[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 1211-1220. [86] WINDER S, BROWN M. Learning local image descriptors[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2007: 1-8. [87] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [88] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6629-6640. [89] WANG Z Z, ZHAO L, CHEN H B, et al. Evaluate and improve the quality of neural style transfer[J]. Computer Vision and Image Understanding, 2021, 207: 103203. |
[1] | 王彩玲, 闫晶晶, 张智栋. 基于多模态数据的人体行为识别方法研究综述[J]. 计算机工程与应用, 2024, 60(9): 1-18. |
[2] | 杨晨曦, 庄旭菲, 陈俊楠, 李衡. 基于深度学习的公交行驶轨迹预测研究综述[J]. 计算机工程与应用, 2024, 60(9): 65-78. |
[3] | 史涛, 崔杰, 李松. 优化改进YOLOv8实现实时无人机车辆检测的算法[J]. 计算机工程与应用, 2024, 60(9): 79-89. |
[4] | 窦智, 高浩然, 刘国奇, 常宝方. 轻量化YOLOv8的小样本钢板缺陷检测算法[J]. 计算机工程与应用, 2024, 60(9): 90-100. |
[5] | 蔡腾, 陈慈发, 董方敏. 结合Transformer和动态特征融合的低照度目标检测[J]. 计算机工程与应用, 2024, 60(9): 135-141. |
[6] | 许智宏, 张天润, 王利琴, 董永峰. 融合图谱重构的时序知识图谱推理[J]. 计算机工程与应用, 2024, 60(9): 181-187. |
[7] | 宋建平, 王毅, 孙开伟, 刘期烈. 结合双曲图注意力网络与标签信息的短文本分类方法[J]. 计算机工程与应用, 2024, 60(9): 188-195. |
[8] | 杨文涛, 雷雨琦, 李星月, 郑天成. 融合汉字输入法的BERT与BLCG的长文本分类研究[J]. 计算机工程与应用, 2024, 60(9): 196-202. |
[9] | 张洋宁, 朱静, 董瑞, 尤泽顺, 王震. 多层级信息增强异构图的篇章级话题分割模型[J]. 计算机工程与应用, 2024, 60(9): 203-211. |
[10] | 陶林娟, 华庚兴, 李波. 基于位置增强词向量和GRU-CNN的方面级情感分析模型研究[J]. 计算机工程与应用, 2024, 60(9): 212-218. |
[11] | 江结林, 朱永伟, 许小龙, 崔燕, 赵英男. 混合特征及多头注意力的中文短文本分类[J]. 计算机工程与应用, 2024, 60(9): 237-243. |
[12] | 车运龙, 袁亮, 孙丽慧. 基于强语义关键点采样的三维目标检测方法[J]. 计算机工程与应用, 2024, 60(9): 254-260. |
[13] | 邱云飞, 王宜帆. 双分支结构的多层级三维点云补全[J]. 计算机工程与应用, 2024, 60(9): 272-282. |
[14] | 叶彬, 朱兴帅, 姚康, 丁上上, 付威威. 面向桌面交互场景的双目深度测量方法[J]. 计算机工程与应用, 2024, 60(9): 283-291. |
[15] | 李钟华, 林初俊, 朱恒亮, 廖诗宇, 白云起. 基于结构感知和全局上下文信息的小目标检测[J]. 计算机工程与应用, 2024, 60(9): 292-298. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||