计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (19): 21-39.DOI: 10.3778/j.issn.1002-8331.2211-0392
赖丽娜,米瑜,周龙龙,饶季勇,徐天阳,宋晓宁
出版日期:
2023-10-01
发布日期:
2023-10-01
LAI Li’na, MI Yu, ZHOU Longlong, RAO Jiyong, XU Tianyang, SONG Xiaoning
Online:
2023-10-01
Published:
2023-10-01
摘要: 随着多传感器的普及,多模态数据获得科研和产业面的持续关注,通过深度学习来处理多源模态信息的技术是核心所在。文本生成图像是多模态技术的方向之一,由于生成对抗网络(GAN)生成图像更具有真实感,使得文本图像生成取得卓越进展。它可用于图像编辑和着色、风格转换、物体变形、照片增强等多个领域。将基于图像生成功能的GAN网络分为四大类:语义增强GAN、可增长式GAN、多样性增强GAN、清晰度增强GAN,并根据分类法提供的方向将基于功能的文本图像生成模型进行整合比较,厘清脉络;分析了现有的评估指标以及常用的数据集,阐明了对复杂文本的处理等方面的可行性以及未来的发展趋势;系统性地补充了生成对抗网络在文本图像生成方面的分析,将有助于研究者进一步推进这一领域。
赖丽娜, 米瑜, 周龙龙, 饶季勇, 徐天阳, 宋晓宁. 生成对抗网络与文本图像生成方法综述[J]. 计算机工程与应用, 2023, 59(19): 21-39.
LAI Li’na, MI Yu, ZHOU Longlong, RAO Jiyong, XU Tianyang, SONG Xiaoning. Survey About Generative Adversarial Network and Text-to-Image Synthesis[J]. Computer Engineering and Applications, 2023, 59(19): 21-39.
[1] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial networks[J].Communications of the ACM,2020,63(11):139-144. [2] RAMESH A,PAVLOV M,GOH G,et al.Zero-shot text-to-image generation[C]//Proceedings of the International Conference on Machine Learning,2021:8821-8831. [3] 36 T,AILA T,LAINE S,et al.Progressive growing of GANs for improved quality,stability,and variation[J].arXiv:1710.10196,2017. [4] BERMANO A H,GAL R,ALALUF Y,et al.State‐of‐the‐art in the architecture,methods,and applications of StyleGAN[J].Computer Graphics Forum,2022,41(2):591-611. [5] NGUYEN T,LE T,VU H,et al.Dual discriminator generative adversarial nets[C]//Advances in Neural Information Processing Systems,2017. [6] RADFORD A,METZ L,CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[J].arXiv:1511.06434,2015. [7] ARJOVSKY M,BOTTOU L.Towards principled methods for training generative adversarial networks[J].arXiv:1701.04862,2017. [8] GULRAJANI I,AHMED F,ARJOVSKY M,et al.Improved training of wasserstein GANs[C]//Advances in Neural Information Processing Systems,2017,30. [9] MIRZA M,OSINDERO S.Conditional generative adversarial nets[J].arXiv:1411.1784,2014. [10] 魏富强,古兰拜尔·吐尔洪,买日旦·吾守尔.生成对抗网络及其应用研究综述[J].计算机工程与应用,2021,57(19):18-31. WEI F Q,TUERHONG G,WUSHOUER M.Review of research on generative adversarial networks and its application[J].Computer Engineering and Applications,2021,57(19):18-31. [11] ZHANG H,GOODFELLOW I,METAXAS D,et al.Self-attention generative adversarial networks[C]//Proceedings of the International Conference on Machine Learning,2019:7354-7363. [12] JING Y,YANG Y,FENG Z,et al.Neural style transfer:a review[J].IEEE Transactions on Visualization and Computer Graphics,2019,26(11):3365-3385. [13] KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013. [14] REZENDE D J,MOHAMED S,WIERSTRA D.Stochastic backpropagation and approximate inference in deep generative models[C]//Proceedings of the International Conference on Machine Learning,2014:1278-1286. [15] 李西明,吴嘉润,吴少乾.敌手能力有限时基于生成对抗网络的保密增强[J].计算机科学与探索,2021,15(7):1220-1226. LI X M,WU J R,WU S Q.GANs based privacy amplification against bounded adversaries[J].Journal of Frontiers of Computer Science and Technology,2021,15(7):1220-1226. [16] LEDIG C,THEIS L,HUSZáR F,et al.Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:4681-4690. [17] ANDREINI P,BONECHI S,BIANCHINI M,et al.Image generation by GAN and style transfer for agar plate image segmentation[J].Computer Methods and Programs in Biomedicine,2020,184:105268. [18] ZHANG H,KOH J Y,BALDRIDGE J,et al.Cross-modal contrastive learning for text-to-image generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:833-842. [19] TAN H,LIU X,YIN B,et al.Cross-modal semantic matching generative adversarial networks for text-to-image synthesis[J].IEEE Transactions on Multimedia,2021,24:832-845. [20] QI Z,FAN C,XU L,et al.MRP-GAN:multi-resolution parallel generative adversarial networks for text-to-image synthesis[J].Pattern Recognition Letters,2021,147:1-7. [21] PENNINGTON J,SOCHER R,MANNING C D.Glove:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP),2014:1532-1543. [22] LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//Proceedings of the International Conference on Machine Learning,2014:1188-1196. [23] JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of tricks for efficient text classification[J].arXiv:1607.01759,2016. [24] 夏鸿斌,肖奕飞,刘渊.融合自注意力机制的长文本生成对抗网络模型[J].计算机科学与探索,2022,16(7):1603-1610. XIA H B,XIAO Y F,LIU Y.Long text generation adversarial network model with self-attention mechanism[J].Journal of Frontiers of Computer Science and Technology,2022,16(7):1603-1610. [25] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013. [26] RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI Blog,2019,1(8):9. [27] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [28] LAN Z,CHEN M,GOODMAN S,et al.Albert:a lite bert for self-supervised learning of language representations[J].arXiv:1909.11942,2019. [29] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015:3431-3440. [30] 魏忠钰,范智昊,王瑞泽,等.从视觉到文本:图像描述生成的研究进展综述[J].中文信息学报,2020,34(7):19-29. WEI Z Y,FAN Z H,WANG R Z,et al.From vision to text:a brief survey for image captioning[J].Journal of Chinese Information Processing,2020,34(7):19-29. [31] DUMOULIN V,BELGHAZI I,POOLE B,et al.Adversarially learned inference[J].arXiv:1606.00704,2016. [32] AGNESE J,HERRERA J,TAO H,et al.A survey and taxonomy of adversarial neural networks for text‐to‐image synthesis[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2020,10(4):e1345. [33] ZHAO L,ZHANG Z,CHEN T,et al.Improved transformer for high-resolution gans[C]//Advances in Neural Information Processing Systems,2021:18367-18380. [34] ARJOVSKY M,CHINTALA S,BOTTOU L.Wasserstein generative adversarial networks[C]//Proceedings of the International Conference on Machine Learning,2017:214-223. [35] IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning,2015:448-456. [36] KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:4401-4410. [37] GAN-QP J S.A novel GAN framework without gradient vanishing and lipschitz constraint[J].arXiv:1811.07296,2018. [38] ZHANG Z,LI M,YU J.D2PGGAN:two discriminators used in progressive growing of GANs[C]//Proceedings of the 2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2019:3177-3181. [39] RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252. [40] 申瑞彩,翟俊海,侯璎真.选择性集成学习多判别器生成对抗网络[J].计算机科学与探索,2022,16(6):1429-1438. SHEN R C,ZHAI J H,HOU Y Z.Multi-discriminator generative adversarial networks based on selective ensemble learning[J].Journal of Frontiers of Computer Science and Technology,2022,16(6):1429-1438. [41] KARRAS T,LAINE S,AITTALA M,et al.Analyzing and improving the image quality of stylegan[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:8110-8119. [42] 胡名起.基于生成对抗网络的文本生成图像研究[D].南京:东南大学,2020. HU M Q.Research on generated image based on generative pair network[D].Nanjing:Southeast University,2020. [43] REED S,AKATA Z,YAN X,et al.Generative adversarial text to image synthesis[C]//Proceedings of the International Conference on Machine Learning,2016:1060-1069. [44] TAO M,TANG H,WU F,et al.DF-GAN:a simple and effective baseline for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:16515-16525. [45] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:770-778. [46] WAH C,BRANSON S,WELINDER P,et al.The Caltech-UCSD birds-200-2011 dataset[D].California Institute of Technology,2011:1-8. [47] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//Proceedings of the European Conference on Computer Vision,2014:740-755. [48] ZHANG Z,SCHOMAKER L.DiverGAN:an efficient and effective single-stage framework for diverse text-to-image generation[J].Neurocomputing,2022,473:182-198. [49] LIAO W,HU K,YANG M Y,et al.Text to image generation with semantic-spatial aware GAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:18187-18196. [50] WU X,ZHAO H,ZHENG L,et al.Adma-GAN:attribute-driven memory augmented GANs for text-to-image generation[C]//Proceedings of the 30th ACM International Conference on Multimedia,2022:1593-1602. [51] HUANG M,MAO Z,WANG P,et al.DSE-GAN:dynamic semantic evolution generative adversarial network for text-to-image generation[C]//Proceedings of the 30th ACM International Conference on Multimedia,2022:4345-4354. [52] QIAO T,ZHANG J,XU D,et al.Mirrorgan:learning text-to-image generation by redescription[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:1505-1514. [53] ZHANG H,XU T,LI H,et al.StackGAN:text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:5907-5915. [54] XU T,ZHANG P,HUANG Q,et al.AttnGAN:fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:1316-1324. [55] ZHU M,PAN P,CHEN W,et al.DM-GAN:dynamic memory generative adversarial networks for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:5802-5810. [56] FENG F,NIU T,LI R,et al.Modality disentangled discriminator for text-to-image synthesis[J].IEEE Transactions on Multimedia,2021,24:2112-2124. [57] LEE M,SEOK J.Controllable generative adversarial network[J].IEEE Access,2019,7:28158-28169. [58] TAN H,LIU X,LI X,et al.Semantics-enhanced adversarial nets for text-to-image synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:10501-10510. [59] BERTHELOT D,SCHUMM T,METZ L.BeGAN:boundary equilibrium generative adversarial networks[J].arXiv:1703.10717,2017. [60] CHENG J,WU F,TIAN Y,et al.RiFeGAN:rich feature generation for text-to-image synthesis from prior knowledge[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:10911-10920. [61] XIA W,YANG Y,XUE J H,et al.TediGAN:text-guided diverse face image generation and manipulation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:2256-2265. [62] RUAN S,ZHANG Y,ZHANG K,et al.DAE-GAN:dynamic aspect-aware GAN for text-to-image synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2021:13960-13969. [63] WANG H,LIN G,HOI S C H,et al.Cycle-consistent inverse GAN for text-to-image synthesis[C]//Proceedings of the 29th ACM International Conference on Multimedia,2021:630-638. [64] PENG J,ZHOU Y,SUN X,et al.Knowledge-driven generative adversarial network for text-to-image synthesis[C]//Proceedings of ICML 2016,2016. [65] YANG Y,WANG L,XIE D,et al.Multi-sentence auxiliary adversarial networks for fine-grained text-to-image synthesis[J].IEEE Transactions on Image Processing,2021,30:2798-2809. [66] HINZ T,HEINRICH S,WERMTER S.Semantic object accuracy for generative text-to-image synthesis[J].arXiv:1910.13321,2019. [67] CHEN Z,MAO Z,FANG S,et al.Background layout generation and object knowledge transfer for text-to-image generation[C]//Proceedings of the 30th ACM International Conference on Multimedia,2022:4327-4335. [68] FANG F,LI Z,LUO F,et al.Discriminator modification in GAN for text-to-image generation[C]//Proceedings of the 2022 IEEE International Conference on Multimedia and Expo,2022:1-6. [69] YANG B,FENG F,WANG X.GR-GAN:gradual refinement text-to-image generation[J].arXiv:2205.11273,2022. [70] FANG F,LI Z,LUO F,et al.PhraseGAN:phrase-boost generative adversarial network for text-to-image generation[C]//Proceedings of the IEEE International Conference on Multimedia and Expo(ICME),2022. [71] BENGIO Y,MESNIL G,DAUPHIN Y,et al.Better mixing via deep representations[C]//Proceedings of the International Conference on Machine Learning,2013:552-560. [72] NILSBACK M E,ZISSERMAN A.Automated flower classification over a large number of classes[C]//Proceedings of the 6th Indian Conference on Computer Vision,Graphics & Image Processing,2008:722-729. [73] ZHANG Z,ZHOU J,YU W,et al.Text-to-image synthesis:starting composite from the foreground content[J].Information Sciences,2022,607:1265-1285. [74] HINZ T,HEINRICH S,WERMTER S.Generating multiple objects at spatially distinct locations[J].arXiv:1901. 00686,2019. [75] WU F,LIU L,HAO F,et al.Text-to-image synthesis based on object-guided joint-decoding transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:18113-18122. [76] GURUMURTHY S,KIRAN SARVADEVABHATLA R,VENKATESH BABU R.DeliGAN:generative adversarial networks for diverse and limited data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:166-174. [77] TAN Y X,LEE C P,NEO M,et al.Text-to-image synthesis with self-supervised learning[J].Pattern Recognition Letters,2022,157:119-126. [78] QUAN F,LANG B,LIU Y.ARRPNGAN:text-to-image GAN with attention regularization and region proposal networks[J].Signal Processing:Image Communication,2022,106:116728. [79] HUANG S,CHEN Y.Generative adversarial networks with adaptive semantic normalization for text-to-image synthesis[J].Digital Signal Processing,2022,120:103267. [80] MA Y,LIU L,ZHANG H,et al.Generative adversarial network based on semantic consistency for text-to-image generation[J].Applied Intelligence,2023,53(4):4703-4716. [81] SHI Z,CHEN Z,XU Z,et al.AtHom:two divergent attentions stimulated by homomorphic training in text-to-image synthesis[C]//Proceedings of the 30th ACM International Conference on Multimedia,2022:2211-2219. [82] CHENG J,WU F,TIAN Y,et al.RiFeGAN2:rich feature generation for text-to-image synthesis from constrained prior knowledge[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(8):5187-5200. [83] LI B,TORR P H S,LUKASIEWICZ T.Memory-driven text-to-image generation[J].arXiv:2208.07022,2022. [84] LI Z,MIN M R,LI K,et al.Stylet2i:toward compositional and high-fidelity text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:18197-18207. [85] 王威,李玉洁,郭富林,等.生成对抗网络及其文本图像合成综述[J].计算机工程与应用,2022,58(19):14-36. WANG W,LI Y J,GUO F L,et al.Survey about generative adversarial network based text-to-image synthesis[J].Computer Engineering and Applications,2022,58(19):14-36. [86] REED S,AKATA Z,LEE H,et al.Learning deep representations of fine-grained visual descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:49-58. [87] FROLOV S,HINZ T,RAUE F,et al.Adversarial text-to-image synthesis:a review[J].Neural Networks,2021,144:187-209. [88] HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]//Advances in Neural Information Processing Systems,2017. [89] SALIMANS T,GOODFELLOW I,ZAREMBA W,et al.Improved techniques for training GANs[C]//Advances in Neural Information Processing Systems,2016. [90] LI W,WEN S,SHI K,et al.Neural architecture search with a lightweight transformer for text-to-image synthesis[J].IEEE Transactions on Network Science and Engineering,2022,9(3):1567-1576. [91] ZHANG Z,SCHOMAKER L.Optimized latent-code selection for explainable conditional text-to-image GANs[C]//Proceedings of the International Joint Conference on Neural Networks(IJCNN),2022:1-9. [92] ZHANG H,YANG S,ZHU H.CJE-TIG:zero-shot cross-lingual text-to-image generation by Corpora-based Joint Encoding[J].Knowledge-Based Systems,2022,239:108006. [93] DONAHUE J,KR?HENBüHL P,DARRELL T.Adversarial feature learning[J].arXiv:1605.09782,2016. [94] CHOI Y,CHOI M,KIM M,et al.StarGAN:Unified generative adversarial networks for multi-domain image-to-image translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:8789-8797. |
[1] | 陈吉尚, 哈里旦木·阿布都克里木, 梁蕴泽, 阿布都克力木·阿布力孜, 米克拉依·艾山, 郭文强. 深度学习在符号音乐生成中的应用研究综述[J]. 计算机工程与应用, 2023, 59(9): 27-45. |
[2] | 姜秋香, 郭伟鹏, 王子龙, 欧阳兴涛, 隆睿睿. Python语言在水文水资源领域中的应用与展望[J]. 计算机工程与应用, 2023, 59(9): 46-58. |
[3] | 刘华玲, 陈尚辉, 乔梁, 刘雅欣. 多模态混合注意力机制的虚假新闻检测研究[J]. 计算机工程与应用, 2023, 59(9): 95-103. |
[4] | 蔡正奕, 赵杰煜, 朱峰. 融合图像特征的单阶段点云目标检测[J]. 计算机工程与应用, 2023, 59(9): 140-149. |
[5] | 罗会兰, 陈翰. 时空卷积注意力网络用于动作识别[J]. 计算机工程与应用, 2023, 59(9): 150-158. |
[6] | 郑玉彤, 孙昊英, 宋伟. 隐空间转换的混合样本图像去雾[J]. 计算机工程与应用, 2023, 59(9): 225-236. |
[7] | 刘华玲, 皮常鹏, 赵晨宇, 乔梁. 基于深度域适应的跨域目标检测算法综述[J]. 计算机工程与应用, 2023, 59(8): 1-12. |
[8] | 何家峰, 陈宏伟, 骆德汉. 深度学习实时语义分割算法研究综述[J]. 计算机工程与应用, 2023, 59(8): 13-27. |
[9] | 张艳青, 马建红, 韩颖, 曹仰杰, 李颉, 杨聪. 真实场景下图像超分辨率重建研究综述[J]. 计算机工程与应用, 2023, 59(8): 28-40. |
[10] | 岱超, 刘萍, 史俊才, 任鸿杰. 利用U型网络的遥感影像建筑物规则化提取[J]. 计算机工程与应用, 2023, 59(8): 105-116. |
[11] | 云飞, 殷雁君, 张文轩, 智敏. 融合注意力机制的对抗式半监督语义分割[J]. 计算机工程与应用, 2023, 59(8): 254-262. |
[12] | 王静, 金玉楚, 郭苹, 胡少毅. 基于深度学习的相机位姿估计方法综述[J]. 计算机工程与应用, 2023, 59(7): 1-14. |
[13] | 蒋玉英, 陈心雨, 李广明, 王飞, 葛宏义. 图神经网络及其在图像处理领域的研究进展[J]. 计算机工程与应用, 2023, 59(7): 15-30. |
[14] | 周玉蓉, 张巧灵, 于广增, 徐伟强. 基于声信号的工业设备故障诊断研究综述[J]. 计算机工程与应用, 2023, 59(7): 51-63. |
[15] | 魏玮, 张鑫, 朱叶. 基于双重注意力和光流估计的人脸替换方法[J]. 计算机工程与应用, 2023, 59(7): 143-151. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||