Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (10): 50-67.DOI: 10.3778/j.issn.1002-8331.2112-0151
• Research Hotspots and Reviews • Previous Articles Next Articles
WANG Yuhao, HE Yu, WANG Zhu
Online:
2022-05-15
Published:
2022-05-15
王宇昊,何彧,王铸
WANG Yuhao, HE Yu, WANG Zhu. Overview of Text-to-Image Generation Methods Based on Deep Learning[J]. Computer Engineering and Applications, 2022, 58(10): 50-67.
王宇昊, 何彧, 王铸. 基于深度学习的文本到图像生成方法综述[J]. 计算机工程与应用, 2022, 58(10): 50-67.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2112-0151
[1] FARHADI A,ENDRES I,HOIEM D,et al.Describing objects by their attributes[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition,Miami,Jun 20-25,2009:1778-1785. [2] KUMAR N,BERG A C,BELHUMEUR P N,et al.Attribute and simile classifiers for face verification[C]//12th IEEE International Conference on Computer Vision,Kyoto,Sep 29-Oct 1,2009:365-372. [3] FU Y,HOSPEDALES T M,XIANG T,et al.Transductive multi-view embedding for zero-shot recognition and annotation[C]//13th European Conference on Computer Vision,Zurich,Sep 6-12,2014.Cham:Springer,2014:584-599. [4] AKATA Z,REED S,WALTER D,et al.Evaluation of output embeddings for fine-grained image classification[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition,Boston,Jun 7-12,2015:2927-2936. [5] GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial networks[C]//Advances in Neural Information Processing Systems 27:Annual Conference on Neural Information Processing Systems,2014:2672-2680. [6] REED S,AKATA Z,LEE H,et al.Learning deep representations of fine-grained visual descriptions[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition, Seattle,Jun 27-30,2016:49-58. [7] YANG Z,HU Z,SALAKHUTDINOV R,et al.Improved variational autoencoders for text modeling using dilated convolutions[C]//34th International Conference on Machine Learning,Sydney,Aug 6-11,2017:3881-3890. [8] YU J,LU Y,QIN Z,et al.Modeling text with graph convolutional network for cross-modal information retrieval[C]//19th Pacific-Rim Conference on Multimedia,Hefei,Sep 21-22,2018.Cham:Springer,2018:223-234. [9] LIU Y,HAN K,TAN Z,et al.Using context information for dialog act classification in DNN framework[C]//2017 Conference on Empirical Methods in Natural Language Processing,Copenhagen,Sep 2017.Stroudsburg:ACL,2017:2170-2178. [10] HIRSCHMAN L,GAIZAUSKAS R.Natural language question answering:the view from here[J].Natural Language Engineering,2001,7(4):275. [11] CHEN K,WANG J,CHEN L C,et al.ABC-CNN:an attention based convolutional neural network for visual question answering[J].arXiv:1511.05960,2015. [12] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014. [13] WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural machine translation system:bridging the gap between human and machine translation[J].arXiv:1609.08144,2016. [14] MIRZA M,OSINDERO S.Conditional generative adversarial nets[J].arXiv:1411.1784,2014. [15] VAN DEN OORD A,KALCHBRENNER N,KAVUKCUOGLU K.Pixel recurrent neural networks[J].arXiv:1601.06759v3, 2016. [16] KINGMA D P,WELLING M.Auto-encoding variational Bayes[J].arXiv:1312.6114,2013. [17] ALAIN G,BENGIO Y,YAO L,et al.GSNs:generative stochastic networks[J].Information and Inference,2016,5(2):210-249. [18] SALAKHUTDINOV R,HINTON G E.Deep Boltzmann machines[J].Journal of Machine Learning Research,2009,5(2):1967-2006. [19] ODENA A,OLAH C,SHLENS J.Conditional image synthesis with auxiliary classifier GANs[C]//34th International Conference on Machine Learning,Sydney,Aug 6-11,2017:2642-2651. [20] 王艺陆.基于StackGAN的文本图像生成问题研究[D].大连:大连理工大学,2021. WANG Y L.Research on text image generation based on StackGAN[D].Dalian:Dalian University of Technology,2021. [21] REED S,AKATA Z,YAN X,et al.Generative adversarial text to image synthesis[C]//33rd International Conference on Machine Learning,New York,Jun 20-22,2016:1060-1069. [22] DASH A,GAMBOA J,AHMED S,et al.TAC-GAN-Text conditioned auxiliary classifier generative adversarial network[J].arXiv:1703.06412,2017. [23] ZHANG H,XU T,LI H,et al.StackGAN:text to photo-realistic image synthesis with stacked generative adversarial networks[C]//16th IEEE International Conference on Computer Vision,Venice,Oct 22-29,2017:5907-5915. [24] ZHANG H,XU T,LI H,et al.StackGAN++:realistic image synthesis with stacked generative adversarial networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(8):1947-1962. [25] 王昭慧.基于生成对抗网络的有条件图像生成研究[D].天津:天津理工大学,2021. WANG Z H.Research on conditional image generation based on generative adversarial networks[D].Tianjin:Tianjin University of Technology,2021. [26] ZHANG Z,XIE Y,YANG L.Photographic text-to-image synthesis with a hierarchically-nested adversarial network[C]//31st IEEE Conference on Computer Vision and Pattern Recognition,Salt Lake City,Jun 18-23,2018:6199-6208. [27] GAO L,CHEN D,SONG J,et al.Perceptual pyramid adversarial networks for text-to-image synthesis[C]//33rd AAAI Conference on Artificial Intelligence,Honolulu,Jan 27-Feb 1,2019:8312-8319. [28] LIN T Y,DOLLáR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Honolulu,Jul 21-26,2017:2117-2125. [29] TAO M,TANG H,WU S,et al.DF-GAN:deep fusion generative adversarial networks for text-to-image synthesis[J].arXiv:2008.05865,2020. [30] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition,Seattle,Jun 27-30,2016:770-778. [31] 黄韬.文本到人物图像的跨模态生成研究[D].广州:广东技术师范大学,2020. HUANG T.Research on cross-modal generation from text to character image[D].Guangzhou:Guangdong Technical Normal University,2020. [32] 吴禹,靳华中.基于文本层级结构的图像描述生成算法[J].湖北工业大学学报,2021,36(4):17-21. WU Y,JIN H Z.Image description generation algorithm based on text hierarchy[J].Journal of Hubei University of Technology,2021,36(4):17-21. [33] YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[C]//2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2016:1480-1489. [34] YOUNG T,HAZARIKA D,PORIA S,et al.Recent trends in deep learning based natural language processing[J].IEEE Computational Intelligence Magazine,2018,13(3):55-75. [35] XU T,ZHANG P,HUANG Q,et al.AttnGAN:fine-grained text to image generation with attentional generative adversarial networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,Jun 18-23,2018:1316-1324. [36] HUANG W,XU Y,OPPERMANN I.Realistic image generation using region-phrase attention[C]//11th Asian Conference on Machine Learning,Nagoya,Nov 17-19,2019:284-299. [37] 胡北辰.基于GAN的文本生成图像算法研究[J].信阳农林学院学报,2021,31(3):115-118. HU B C.Research on text image generation algorithm based on GAN[J].Journal of Xinyang University of Agriculture and Forestry,2021,31(3):115-118. [38] TAN H,LIU X,LI X,et al.Semantics-enhanced adversarial nets for text-to-image synthesis[C]//2019 IEEE/CVF International Conference on Computer Vision,Seoul,Oct 27-Nov 2,2019:10501-10510. [39] LI B,QI X,LUKASIEWICZ T,et al.Controllable text-to-image generation[J].arXiv:1909.07083,2019. [40] YIN G,LIU B,SHENG L,et al.Semantics disentangling for text-to-image generation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,Jun 16-20,2019:2327-2336. [41] DUMOULIN V,SHLENS J,KUDLUR M.A learned representation for artistic style[J].arXiv:1610.07629,2016. [42] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//16th IEEE International Conference on Computer Vision,Venice,Oct 22-29,2017:2980-2988. [43] CHA M,GWON Y L,KUNG H T.Adversarial learning of semantic relevance in text to image synthesis[C]//33rd AAAI Conference on Artificial Intelligence,Honolulu,Jan 27-Feb 1,2019:3272-3279. [44] 汪敏.基于跨模态语义关系的图像生成关键技术研究[D].北京:北京交通大学,2021. WANG M.Research on key technologies of image generation based on cross-modal semantic relations[D].Beijing:Beijing Jiaotong University,2021. [45] LAO Q,HAVAEI M,PESARANGHADER A,et al.Dual adversarial inference for text-to-image synthesis[C]//2019 IEEE/CVF International Conference on Computer Vision,Seoul,Oct 27-Nov 2,2019:7567-7576. [46] NGUYEN A,CLUNE J,BENGIO Y,et al.Plug & play generative networks:conditional iterative generation of images in latent space[C]//2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Honolulu,Jul 21-26,2017:4467-4477. [47] ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//16th IEEE International Conference on Computer Vision,Venice,Oct 22-29,2017:2223-2232. [48] QIAO T,ZHANG J,XU D,et al.MirrorGAN:learning text-to-image generation by redescription[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,Jun 16-20,2019:1505-1514. [49] ZHU M,PAN P,CHEN W,et al.DM-GAN:dynamic memory generative adversarial networks for text-to-image synthesis[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,Jun 16-20,2019:5802-5810. [50] STAP D,BLEEKER M,IBRAHIMI S,et al.Conditional image generation and manipulation for user-specified content[J].arXiv:2005.04909,2020. [51] 胡涛.基于生成对抗网络的文本描述图像生成研究[D].合肥:中国科学技术大学,2021. HU T.Research on text description image generation based on generative confrontation network[D].Hefei:University of Science and Technology of China,2021. [52] 徐泽,帅仁俊,刘开凯,等.基于特征融合的文本到图像的生成[J].计算机科学,2021,48(6):125-130. XU Z,SHUAI R J,LIU K K,et al.Generation of text to image based on feature fusion[J].Computer Science,2021,48(6):125-130. [53] YUAN M,PENG Y.Bridge-GAN:interpretable representation learning for text-to-image synthesis[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(11):4258-4268. [54] KARRAS T,AILA T,LAINE S,et al.Progressive growing of GANs for improved quality,stability,and variation[J].arXiv:1710.10196,2017. [55] WANG Z,QUAN Z,WANG Z J,et al.Text to image synthesis with bidirectional generative adversarial network[C]//2020 IEEE International Conference on Multimedia and Expo,Jul 6-10,2020:1-6. [56] BROCK A,DONAHUE J,SIMONYAN K.Large scale GAN training for high fidelity natural image synthesis[J].arXiv:1809.11096,2018. [57] JOSEPH K J,PAL A,RAJANALA S,et al.C4Synth:cross-caption cycle-consistent text-to-image synthesis[C]//19th IEEE Winter Conference on Applications of Computer Vision,Waikoloa Village,Jan 7-11,2019:358-366. [58] CHENG J,WU F,TIAN Y,et al.RiFeGAN:rich feature generation for text-to-image synthesis from prior knowledge[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle,Jun 13-19,2020:10911-10920. [59] NIU T,FENG F,LI L,et al.Image synthesis from locally related texts[C]//2020 International Conference on Multimedia Retrieval,2020:145-153. [60] HINZ T,HEINRICH S,WERMTER S.Generating multiple objects at spatially distinct locations[C]//2019 International Conference on Learning Representations,New Orleans,May 6-9,2019. [61] HINZ T,HEINRICH S,WERMTER S G.Semantic object accuracy for generative text-to-image synthesis[J].arXiv:1910.13321,2019. [62] SYLVAIN T,ZHANG P,BENGIO Y,et al.Object-centric image generation from layouts[J].arXiv:2003.07449,2020. [63] HONG S,YANG D,CHOI J,et al.Inferring semantic layout for hierarchical text-to-image synthesis[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,Jun 18-23,2018:7986-7994. [64] QIAO T,ZHANG J,XU D,et al.Learn,imagine and create:text-to-image generation from prior knowledge[C]//Advances in Neural Information Processing Systems 32:Annual Conference on Neural Information Processing Systems 2019,Vancouver,Dec 8-14,2019:887-897. [65] WANG M,LANG C,LIANG L,et al.Attentive generative adversarial network to bridge multi-domain gap for image synthesis[C]//2020 International Conference on Multimedia and Expo,Jul 6-10,2020:1-6. [66] PAVLLO D,LUCCHI A,HOFMANN T.Controlling style and semantics in weakly-supervised image generation[C]//16th European Conference on Computer Vision.Cham:Springer,2020:482-499. [67] JOHNSON J E,GUPTA A,LI F F.Image generation from scene graphs[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,Jun 18-23,2018:1219-1228. [68] VO D M,SUGIMOTO A.Visual-relation conscious image generation from structured-text[C]//16th European Conference on Computer Vision.Cham:Springer,2020:290-306. [69] SHI X,CHEN Z,WANG H,et al.Convolutional LSTM network:a machine learning approach for precipitation nowcasting[C]//29th Annual Conference on Neural Information Processing Systems,Montreal,Dec 7-12,2015:802-810. [70] LI Y,MA T,BAI Y,et al.PasteGAN:a semi-parametric method to generate image from scene graph[C]//33rd Conference on Neural Information Processing Systems,Vancouver,Dec 8-14,2019:3948-3958. [71] LUCIC M,KURACH K,MICHALSKI M,et al.Are GANs created equal? A large-scale study[C]//Advances in Neural Information Processing Systems 31:Annual Conference on Neural Information Processing Systems 2018,Montréal,Dec 3-8,2018:698-707. [72] ISOLA P,ZHU J Y,ZHOU T,et al.Image-to-image translation with conditional adversarial networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Honolulu,Jul 21-26,2017:1125-1134. [73] GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[J].arXiv:1412.6572,2014. [74] HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]//Advances in Neural Information Processing Systems 30:Annual Conference on Neural Information Processing Systems 2017,Long Beach,Dec 4-9,2017:6626-6637. [75] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems 27:Annual Conference on Neural Information Processing Systems 2014,Montreal,Dec 8-13,2014:2672-2680. [76] IM D J,KIM C D,JIANG H,et al.Generating images with recurrent adversarial networks[J].arXiv:1602.05110,2016. [77] CHE T,LI Y,JACOB A P,et al.Mode regularized generative adversarial networks[J].arXiv:1612.02136,2016. [78] RAZAVI A,VAN DEN OORD A,VINYALS O.Generating diverse high-fidelity images with VQ-VAE-2[C]//33rd Conference on Neural Information Processing Systems,Vancouver,Dec 8-14,2019:14866-14876. [79] VAN OORD A,KALCHBRENNER N,KAVUKCUOGLU K.Pixel recurrent neural networks[C]//33rd International Conference on Machine Learning,New York,Jun 20-22,2016:1747-1756. [80] OORD A,KALCHBRENNER N,VINYALS O,et al.Conditional image generation with PixelCNN decoders[C]//30th Conference on Neural Information Processing Systems,Barcelona,2016:4790-4798. [81] MENICK J,KALCHBRENNER N.Generating high fidelity images with subscale pixel networks and multidimensional upscaling[J].arXiv:1812.01608,2018. [82] DINH L,KRUEGER D,BENGIO Y.NICE:non-linear independent components estimation[J].arXiv:1410.8516,2014. [83] DINH L,SOHL-DICKSTEIN J,BENGIO S.Density estimation using real NVP[J].arXiv:1605.08803,2016. [84] HYV?RINEN A,DAYAN P.Estimation of non-normalized statistical models by score matching[J].Journal of Machine Learning Research,2005,6(4):695-709. [85] SONG Y,ERMON S.Generative modeling by estimating gradients of the data distribution[J].arXiv:1907.05600,2019. [86] JOLICOEUR-MARTINEAU A,PICHé-TAILLEFER R,COMBES R T,et al.Adversarial score matching and improved sampling for image generation[J].arXiv:2009.05475,2020. [87] PARMAR N,VASWANI A,USZKOREIT J,et al.Image transformer[C]//35th International Conference on Machine Learning,Stockholm,Jul 10-15,2018:4055-4064. [88] CHEN M,RADFORD A,CHILD R,et al.Generative pretraining from pixels[C]//2020 International Conference on Machine Learning,Jul 13-18,2020:1691-1703. [89] ESSER P,ROMBACH R,OMMER B.Taming transformers for high-resolution image synthesis[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville,Jun 21-24,2021:12873-12883. [90] ZHANG R,ISOLA P,EFROS A A,et al.The unreasonable effectiveness of deep features as a perceptual metric[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City,Jun 18-23,2018:586-595. [91] LI W,ZHANG P,ZHANG L,et al.Object-driven text-to-image synthesis via adversarial training[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,Jun 16-20,2019:12174-12182. [92] ZHANG H,KOH J Y,BALDRIDGE J,et al.Cross-modal contrastive learning for text-to-image generation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville,Jun 21-24,2021:833-842. |
[1] | LUO Xianglong, GUO Huang, LIAO Cong, HAN Jing, WANG Lixin. Spatiotemporal Short-Term Traffic Flow Prediction Based on Broad Learning System [J]. Computer Engineering and Applications, 2022, 58(9): 181-186. |
[2] | Alim Samat, Sirajahmat Ruzmamat, Maihefureti, Aishan Wumaier, Wushuer Silamu, Turgun Ebrayim. Research on Sentence Length Sensitivity in Neural Network Machine Translation [J]. Computer Engineering and Applications, 2022, 58(9): 195-200. |
[3] | CHEN Yixiao, Alifu·Kuerban, LIN Wenlong, YUAN Xu. CA-YOLOv5 for Crowded Pedestrian Detection [J]. Computer Engineering and Applications, 2022, 58(9): 238-245. |
[4] | CHEN Yidong, LU Zhonghua. Forecasting CPI Based on Convolutional Neural Network and Long Short-Term Memory Network [J]. Computer Engineering and Applications, 2022, 58(9): 256-262. |
[5] | FANG Yiqiu, LU Zhuang, GE Junwei. Forecasting Stock Prices with Combined RMSE Loss LSTM-CNN Model [J]. Computer Engineering and Applications, 2022, 58(9): 294-302. |
[6] | GAO Guangshang. Survey on Attention Mechanisms in Deep Learning Recommendation Models [J]. Computer Engineering and Applications, 2022, 58(9): 9-18. |
[7] | JI Meng, HE Qinglong. AdaSVRG: Accelerating SVRG by Adaptive Learning Rate [J]. Computer Engineering and Applications, 2022, 58(9): 83-90. |
[8] | ZHANG Xin, YAO Qing’an, ZHAO Jian, JIN Zhenjun, FENG Yuncong. Image Semantic Segmentation Based on Fully Convolutional Neural Network [J]. Computer Engineering and Applications, 2022, 58(8): 45-57. |
[9] | SHI Jie, YUAN Chenxiang, DING Fei, KONG Weixiang. Survey of Building Target Detection in SAR Images [J]. Computer Engineering and Applications, 2022, 58(8): 58-66. |
[10] | XIONG Fengguang, ZHANG Xin, HAN Xie, KUANG Liqun, LIU Huanle, JIA Jionghao. Research on Improved Semantic Segmentation of Remote Sensing [J]. Computer Engineering and Applications, 2022, 58(8): 185-190. |
[11] | YANG Jinfan, WANG Xiaoqiang, LIN Hao, LI Leixiao, YANG Yanyan, LI Kecen, GAO Jing. Review of One-Stage Vehicle Detection Algorithms Based on Deep Learning [J]. Computer Engineering and Applications, 2022, 58(7): 55-67. |
[12] | WANG Bin, LI Xin. Research on Multi-Source Domain Adaptive Algorithm Integrating Dynamic Residuals [J]. Computer Engineering and Applications, 2022, 58(7): 162-166. |
[13] | TAN Shuqiu, TANG Guofang, TU Yuanya, ZHANG Jianxun, GE Panjie. Classroom Monitoring Students Abnormal Behavior Detection System [J]. Computer Engineering and Applications, 2022, 58(7): 176-184. |
[14] | ZHU Xuechao, ZHANG Fei, GAO Lu, REN Xiaoying, HAO Bin. Research on Speech Recognition Based on Residual Network and Gated Convolution Network [J]. Computer Engineering and Applications, 2022, 58(7): 185-191. |
[15] | ZHANG Meiyu, LIU Yuehui, HOU Xianghui, QIN Xujia. Automatic Coloring Method for Gray Image Based on Convolutional Network [J]. Computer Engineering and Applications, 2022, 58(7): 229-236. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||