
Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (24): 44-64.DOI: 10.3778/j.issn.1002-8331.2405-0048
• Research Hotspots and Reviews • Previous Articles Next Articles
GAO Xinyu, DU Fang, SONG Lijuan
Online:2024-12-15
Published:2024-12-12
高欣宇,杜方,宋丽娟
GAO Xinyu, DU Fang, SONG Lijuan. Comparative Review of Text-to-Image Generation Techniques Based on Diffusion Models[J]. Computer Engineering and Applications, 2024, 60(24): 44-64.
高欣宇, 杜方, 宋丽娟. 基于扩散模型的文本图像生成对比研究综述[J]. 计算机工程与应用, 2024, 60(24): 44-64.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2405-0048
| [1] ALKHAWLANI M, ELMOGY M, EL BAKRY H. Text-based, content-based, and semantic-based image retrievals: a survey[J]. International Journal of Computer and Information Technology, 2015, 4(1): 58-66. [2] LI W, DUAN L X, XU D, et al. Text-based image retrieval using progressive multi-instance learning[C]//Proceedings of the 2011 International Conference on Computer Vision, 2011: 2049-2055. [3] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems, 2014: 2672-2680. [4] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilis-tic models[C]//Advances in Neural Information Processing Systems, 2020: 6840-6851. [5] 赖丽娜, 米瑜, 周龙龙, 等. 生成对抗网络与文本图像生成方法综述[J]. 计算机工程与应用, 2023, 59(19): 21-39. LAI L N, MI Y, ZHOU L L, et al. Survey about generative adversarial network and text-to-image synthesis[J]. Computer Engineering and Applications, 2023, 59(19): 21-39. [6] 王威, 李玉洁, 郭富林, 等. 生成对抗网络及其文本图像合成综述[J]. 计算机工程与应用, 2022, 58(19): 14-36. WANG W, LI Y J, GUO F L, et al. Survey about generative adversarial network based text-to-image synthesis[J]. Computer Engineering and Applications, 2022, 58(19): 14-36. [7] 陈佛计, 朱枫, 吴清潇, 等. 生成对抗网络及其在图像生成中的应用研究综述[J]. 计算机学报, 2021, 44(2): 347-369. CHEN F J, ZHU F, WU Q X, et al. A survey about image generation with generative adversarial nets[J]. Chinese Journal of Computers, 2021, 44(2): 347-369. [8] CROITORU F A, HONDRU V, IONESCU R T, et al. Diffusion models in vision: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(9): 10850-10869. [9] YANG L, ZHANG Z L, SONG Y, et al. Diffusion models: a comprehensive survey of methods and applications[J]. ACM Computing Surveys, 2023, 56(4): 1-39. [10] ZHAN F N, YU Y C, WU R L, et al. Multimodal image synthesis and editing: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(12): 15098-15119. [11] CAO P, ZHOU F, SONG Q, et al. Controllable generation with text-to-image diffusion models: a survey[J]. arXiv:2403. 04279, 2024. [12] 胡铭菲, 左信, 刘建伟. 深度生成模型综述[J]. 自动化学报, 2022, 48(1): 40-74. HU M F, ZUO X, LIU J W. Survey on deep generative model[J]. Acta Automatica Sinica, 2022, 48(1): 40-74. [13] SOHL-DICKSTEIN J, WEISS E, MAHESWARANAT HAN N, et al. Deep unsupervised learning using nonequilibrium thermodynamics[C]//Procedings of the International Conference on Machine Learning, 2015: 2256-2265. [14] DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[C]//Advances in Neural Information Processing Systems, 2021: 8780-8794. [15] HO J, SALIMANS T. Classifier-free diffusion guidance[J]. arXiv:2207.12598, 2022. [16] KIM G, KWON T, YE J C. Diffusionclip: text-guided diffusion models for robust image manipulation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 2426-2435. [17] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervi-sion[C]//Procedings of the International Conference on Machine Learning, 2021: 8748-8763. [18] REED S, AKATA Z, YAN X, et al. Generative adversarial text to image synthesis[C]//Procedings of the International Conference on Machine Learning, 2016: 1060-1069. [19] RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv:1511.06434, 2015. [20] ZHANG H, XU T, LI H, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 5907-5915. [21] ZHANG H, XU T, LI H, et al. StackGAN++: realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(8): 1947-1962. [22] TAN H, LIU X, LIU M, et al. KT-GAN: knowledge-transfer generative adversarial network for text-to-image synthesis[J]. IEEE Transactions on Image Processing, 2020, 30: 1275-1290. [23] RUAN S, ZHANG Y, ZHANG K, et al. DAE-GAN: dynamic aspect-aware GAN for text-to-image synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 13960-13969. [24] YIN G, LIU B, SHENG L, et al. Semantics disentangling for text-to-image generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2327-2336. [25] XU T, ZHANG P, HUANG Q, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1316-1324. [26] ZHU M, PAN P, CHEN W, et al. DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5802-5810. [27] ZHOU Y, ZHANG R, CHEN C, et al. LAFITE: towards language-free training for text-to-image generation[J]. arXiv:2111.13792, 2021. [28] TAO M, BAO B K, TANG H, et al. GALIP: generative adversarial CLIPs for text-to-image synthsis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14214-14223. [29] TAO M, TANG H, WU F, et al. DF-GAN: a simple and effective baseline for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 16515-16525. [30] ZHANG H, KOH J Y, BALDRIDGE J, et al. Cross-modal contrastive learning for text-to-image generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 833-842. [31] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[C]//Advances in Neural Information Processing Systems, 2018: 8735-8745. [32] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017. [33] RAMESH A, PAVLOV M, GOH G, et al. Zero-shot text-to-image generation[C]//Proceedings of the International Conference on Machine Learning, 2021: 8821-8831. [34] GAGE P. A new algorithm for data compression[J]. The C Users Journal, 1994, 12(2): 23-38. [35] DING M, YANG Z, HONG W, et al. CogView: mastering text-to-image generation via transformers[C]//Advances in Neural Information Processing Systems, 2021: 19822-19835. [36] DING M, ZHENG W, HONG W, et al. CogView2: faster and better text-to-image generation via hierarchical transformers[C]//Advances in Neural Information Processing Systems, 2022: 16890-16902. [37] YU J, XU Y, KOH J Y, et al. Scaling autoregressive models for content-rich text-to-image generation[J]. arXiv:2206.10789, 2022. [38] GAFNI O, POLYAK A, ASHUAL O, et al. Make-a-scene: scene-based text-to-image generation with human priors[C]//Proceedings of the European Conference on Computer Vision, 2022: 89-106. [39] NICHOL A Q, DHARIWAL P, RAMESH A, et al. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models[C]//Proceedings of the Interna-tional Conference on Machine Learning, 2022: 16784-16804. [40] RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with clip latents[J]. arXiv:2204.06125, 2022. [41] GU S, CHEN D, BAO J, et al. Vector quantized diffusion model for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 10696-10706. [42] TANG Z, GU S, BAO J, et al. Improved vector quantized diffusion models[J]. arXiv:2205.16007, 2022. [43] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 10684-10695. [44] SAHARIA C, CHAN W, SAXENA S, et al. Photorealistic text-to-image diffusion models with deep language under-standing[C]//Advances in Neural Information Processing Systems, 2022: 36479-36494. [45] LI R, LI W, YANG Y, et al. Swinv2-Imagen: hierarchical vision transformer diffusion models for text-to-image generation[J]. Neural Computing and Applications, 2024, 36: 17245-17260. [46] LI W, XU X, XIAO X, et al. Upainting: unified text-to-image diffusion generation with cross-modal guidance[J]. arXiv:2210.16031, 2022. [47] BETKER J, GOH G, JING L, et al. Improving image gen-eration with better captions[J]. Computer Science, 2023, 2(3): 8. [48] YANG L, YU Z, MENG C, et al. Mastering text-to-image diffusion: recaptioning, planning, and generating with mul-timodal LLMs[J]. arXiv:2401.11708, 2024. [49] ZHANG X, YANG L, CAI Y, et al. RealCompo: dynamic equilibrium between realism and compositionality improves text-to-image diffusion models[J]. arXiv:2402.12908, 2024. [50] ZHANG Z, ZHAO Z, YU J, et al. ShiftDDPMs: exploring conditional diffusion models by shifting diffusion trajecto-ries[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2023: 3552-3560. [51] ZHOU Y, LIU B, ZHU Y, et al. Shifted diffusion for text-to-image generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10157-10166. [52] BALAJI Y, NAH S, HUANG X, et al. eDiff-I: text-to-image diffusion models with an ensemble of expert denoisers[J]. arXiv:2211.01324, 2022. [53] FENG Z, ZHANG Z, YU X, et al. ERNIE-ViLG 2. 0: improving text-to-image diffusion model with knowledge-enhanced mixture-of-denoising-experts[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10135-10145. [54] XUE Z, SONG G, GUO Q, et al. RAPHAEL: text-to-image generation via large mixture of diffusion paths[C]//Advances in Neural Information Processing Systems, 2024: 41693-41706. [55] YANG L, LIU J, HONG S, et al. Improving diffusion-based image synthesis with context prediction[C]//Advances in Neural Information Processing Systems, 2024: 37636-37656. [56] CHEN S, XU M, REN J, et al. GenTron: delving deepinto diffusion transformers for image and video generation[J]. arXiv:2312.04557, 2023. [57] CHEN J, YU J, GE C, et al. PixArt-alpha: fast training of diffusion transformer for photorealistic text-to-image synthesis[C]//Proceedings of the Twelfth International Conference on Learning Representations, 2023. [58] ESSER P, KULAL S, BLATTMANN A, et al. Scaling rec-tified flow transformers for high-resolution image synthesis[J]. arXiv:2403.03206, 2024. [59] LU G, GUO Y, HAN J, et al. PanGu-Draw: advancing resource-efficient text-to-image synthesis with time-decoupled training and reusable coop-diffusion[J]. arXiv:2312.16486, 2023. [60] DAI X, HOU J, MA C Y, et al. Emu: enhancing image generation models using photogenic needles in a haystack[J]. arXiv:2309.15807, 2023. [61] HUANG K, SUN K, XIE E, et al. T2i-compbench: a com-prehensive benchmark for open-world compositional text-to-image generation[C]//Advances in Neural Information Processing Systems, 2023: 78723-78747. [62] JIANG D, SONG G, WU X, et al. CoMat: aligning text-to-image diffusion model with image-to-text concept matching[J]. arXiv:2404.03653, 2024. [63] HU X, WANG R, FANG Y, et al. ELLA: equip diffusion models with LLM for enhanced semantic alignment[J]. arXiv:2403. 05135, 2024. [64] CHEN W, HU H, SAHARIA C, et al. Re-Imagen: retrieval-augmented text-to-image generator[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2022. [65] LIU N, LI S, DU Y, et al. Compositional visual generation with composable diffusion models[C]//Proceedings of the European Conference on Computer Vision, 2022: 423-439. [66] FENG W, HE X, FU T J, et al. Training-free structured diffusion guidance for compositional text-to-image synthesis[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2022. [67] CHEFER H, ALALUF Y, VINKER Y, et al. Attend-and-excite: attention-based semantic guidance for text-to-image diffusion models[J]. ACM Transactions on Graphics (TOG), 2023, 42(4): 1-10. [68] RAZZHIGAEV A, SHAKHMATOV A, MALTSEVA A, et al. Kandinsky: an improved text-to-image synthesis with image prior and latent diffusion[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2023: 286-295. [69] PODELL D, ENGLISH Z, LACEY K, et al. SDXL: improving latent diffusion models for high-resolution image synthesis[C]//Proceedings of the Twelfth International Conference on Learning Representations, 2023. [70] PERNIAS P, RAMPAS D, AUBREVILLE M. Wuerstchen: efficient pretraining of text-to-image models[J]. arXiv:2306. 00637, 2023. [71] PATEL M, KIM C, CHENG S, et al. Eclipse: a resource-efficient text-to-image prior for image generations[J]. arXiv:2312.04655, 2023. [72] ZHENG H, HE P, CHEN W, et al. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders[J]. arXiv:2202.09671, 2022. [73] XU Y, ZHAO Y, XIAO Z, et al. UFOGen: you forward once large scale text-to-image generation via diffusion GANs[J]. arXiv:2311.09257, 2023. [74] SHEYNIN S, ASHUAL O, POLYAK A, et al. Knn-diffusion: image generation via large-scale retrieval[J]. arXiv:2204. 02849, 2022. [75] KIM B K, SONG H K, CASTELLS T, et al. On architec-tural compression of text-to-image diffusion models[J]. arXiv:2305.15798, 2023. [76] KOHLER J, PUMAROLA A, SCH?NFELD E, et al. Imagine flash: accelerating emu diffusion models with backward distillation[J]. arXiv:2405.05224, 2024. [77] PHIL W. Rudalle: generate images from texts[EB/OL]. (2022)[2022-06-23]. https://github.com/ai-forever/ru-dalle. [78] WANG C, DUAN Z, LIU B, et al. PAI-Diffusion: constructing and serving a family of open chinese diffusion models for text-to-image synthesis on the cloud[J]. arXiv:2309.05534, 2023. [79] WU X, ZHANG D, GAN R, et al. Taiyi-Diffusion-XL: advancing bilingual text-to-image generation with large vision-language model support[J]. arXiv:2401.14688, 2024. [80] LI Z, ZHANG J, LIN Q, et al. Hunyuan-DiT: a powerful multi-resolution diffusion transformer with fine-grained Chinese understanding[J]. arXiv:2405.08748, 2024. [81] YE F, LIU G, WU X, et al. AltDiffusion: a multilingual text-to-image diffusion model[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 6648-6656. [82] VAN DEN OORD A, VINYALS O. Neural discrete repre-sentation learning[C]//Advances in Neural Information Processing Systems, 2017: 6309-6318. [83] RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. The Journal of Machine Learning Research, 2020, 21(1): 5485-5551. [84] LI Y, LIU H, WU Q, et al. GLIGEN: open-set grounded text-to-image generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 22511-22521. [85] PEEBLES W, XIE S. Scalable diffusion models with trans-formers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 4195-4205. [86] WAH C, BRANSON S, WELINDER P, et al. The caltech-ucsd birds-200-2011 dataset[D]. California Institute of Technology, 2011: 1-8. [87] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, September 6-12, 2014: 740-755. [88] NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C]//Proceedings of the 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008: 722-729. [89] KARRAS T, AILA T, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[C]//Proceedings of the 6th International Conference on Learning Representations, 2018. [90] BAKR E M, SUN P, SHEN X, et al. HRS-bench: holistic, reliable and scalable benchmark for text-to-image models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 20041-20053. [91] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training gans[C]//Advances in Neural Information Processing Systems, 2016: 2234-2242. [92] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[C]//Advances in Neural Information Processing Systems, 2017: 6629-6640. [93] DONG H, YU S, WU C, et al. Semantic image synthesis via adversarial learning[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 5706-5714. [94] LI B, QI X, LUKASIEWICZ T, et al. ManiGAN: text-guided image manipulation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 7880-7889. [95] PATASHNIK O, WU Z, SHECHTMAN E, et al. StyleCLIP: text-driven manipulation of stylegan imagery[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2085-2094. [96] XIA W, YANG Y, XUE J H, et al. TediGAN: text-guided diverse face image generation and manipulation[C]//Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2021: 2256-2265. [97] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4401-4410. [98] PERNU? M, FOOKES C, ?TRUC V, et al. FICE: text-conditioned fashion image editing with guided GAN inversion[J]. arXiv:2301.02110, 2023. [99] SHEN Y, YANG C, TANG X, et al. InterFaceGAN: inter-preting the disentangled face representation learned by GANs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(4): 2004-2018. [100] BAI Y, ZHONG Z, DONG C, et al. Towards arbitrary text-driven image manipulation via space alignment[J]. arXiv:2301.10670, 2023. [101] COUAIRON G, VERBEEK J, SCHWENK H, et al. DiffEdit: diffusion-based semantic image editing with mask guidance[J]. arXiv:2210.11427, 2022. [102] KAWAR B, ZADA S, LANG O, et al. Imagic: text-based real image editing with diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 6007-6017. [103] HERTZ A, MOKADY R, TENENBAUM J, et al. Prompt-to-prompt image editing with cross-attention control[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2022. [104] MOKADY R, HERTZ A, ABERMAN K, et al. Null-text inversion for editing real images using guided diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 6038-6047. [105] ELARABAWY A, KAMATH H, DENTON S. Direct inversion: optimization-free text-driven real image editing with diffusion models[J]. arXiv:2211.07825, 2022. [106] LI S, VAN DE WEIJER J, HU T, et al. StyleDiffusion: prompt-embedding inversion for text-based editing[J]. arXiv:2303. 15649, 2023. [107] MIYAKE D, IOHARA A, SAITO Y, et al. Negative-prompt inversion: fast image inversion for editing with text-guided diffusion models[J]. arXiv:2305.16807, 2023. [108] XU S, HUANG Y, PAN J, et al. Inversion-free image edit-ing with natural language[J]. arXiv:2312.04965, 2023. [109] CHEN K, CHOY C B, SAVVA M, et al. Text2shape: gen-erating shapes from natural language by learning joint embeddings[C]//Proceedings of the 14th Asian Conference on Computer Vision (ACCV 2018), Perth, Australia, December 2-6, 2018: 100-116. [110] FUKAMIZU K, KONDO M, SAKAMOTO R. Generation high resolution 3D model from natural language by generative adversarial network[J]. arXiv:1901.07165, 2019. [111] WEI J, WANG H, FENG J, et al. TAPS3D: text-guided 3D textured shape generation from pseudo supervision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 16805-16815. [112] HUANG T, ZENG Y, DONG B, et al. TextField3D: towards enhancing open-vocabulary 3D generation with noisy text fields[J]. arXiv:2309.17175, 2023. [113] SANGHI A, CHU H, LAMBOURNE J G, et al. CLIP-forge: towards zero-shot text-to-shape generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 18603-18613. [114] MITTAL P, CHENG Y C, SINGH M, et al. AutoSDF: shape priors for 3D completion, reconstruction and generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 306-315. [115] QIAN X, WANG Y, LUO S, et al. Pushing auto-regressive models for 3D shape generation at capacity and scalability[J]. arXiv:2402.12225, 2024. [116] POOLE B, JAIN A, BARRON J T, et al. DreamFusion: text-to-3D using 2D diffusion[J]. arXiv:2209.14988, 2022. [117] WANG H, DU X, LI J, et al. Score jacobian chaining: lifting pretrained 2D diffusion models for 3D generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 12619-12629. [118] LIN C H, GAO J, TANG L, et al. Magic3D: high-resolution text-to-3D content creation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 300-309. [119] METZER G, RICHARDSON E, PATASHNIK O, et al. Latent-NeRF for shape-guided generation of 3D shapes and textures[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 12663-12673. [120] XU J, WANG X, CHENG W, et al. Dream3D: zero-shot text-to-3D synthesis using 3D shape prior and text-to-image diffusion models[C]//Proceedings of the IEEE/CVF Con-ference on Computer Vision and Pattern Recognition, 2023: 20908-20918. [121] SEO J, JANG W, KWAK M S, et al. Let 2D diffusion model know 3D-consistency for robust text-to-3D generation[C]//Proceedings of the Twelfth International Conference on Learning Representations, 2023. [122] HONG S, AHN D, KIM S. Debiasing scores and prompts of 2D diffusion for view-consistent text-to-3D generation[C]//Advances in Neural Information Processing Systems, 2024. [123] CHEN R, CHEN Y, JIAO N, et al. Fantasia3D: disentan-gling geometry and appearance for high-quality text-to-3D content creation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 22246-22256. [124] SHI Y, WANG P, YE J, et al. MVDream: multi-view diffu-sion for 3D generation[J]. arXiv:2308.16512, 2023. [125] LI Y, MIN M, SHEN D, et al. Video generation from text[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018. [126] PAN Y, QIU Z, YAO T, et al. To create what you tell: gen-erating videos from captions[C]//Proceedings of the 25th ACM International Conference on Multimedia, 2017: 1789-1798. [127] BALAJI Y, MIN M R, BAI B, et al. Conditional GAN with discriminative filter generation for text-to-video synthesis[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019: 1995-2001. [128] CHEN Q, WU Q, CHEN J, et al. Scripted video generation with a bottom-up generative adversarial network[J]. IEEE Transactions on Image Processing, 2020, 29: 7454-7467. [129] KIM D, JOO D, KIM J. TiVGAN: text to image to video generation with step-by-step evolutionary generator[J]. IEEE Access, 2020, 8: 153113-153122. [130] MEHMOOD R, BASHIR R, GIRI K J. ODD-VGAN: optimised dual discriminator video generative adversarial network for text-to-video generation with heuristic strategy[J]. Journal of Information & Knowledge Management, 2023: 2350041. [131] MEHMOOD R, BASHIR R, GIRI K J. VTM-GAN: video-text matcher based generative adversarial network for generating videos from textual description[J]. International Journal of Information Technology, 2024, 16(1): 221-236. [132] HONG W, DING M, ZHENG W, et al. CogVideo: large-scale pretraining for text-to-video generation via transformers[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2022. [133] YU L, CHENG Y, SOHN K, et al. MAGVIT: masked generative video transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10459-10469. [134] YU L, LEZAMA J, GUNDAVARAPU N B, et al. Language model beats diffusion-tokenizer is key to visual generation[J]. arXiv:2310.05737, 2023. [135] WU C, HUANG L, ZHANG Q, et al. GODIVA: generating open-domain videos from natural descriptions[J]. arXiv:2104.14806, 2021. [136] VILLEGAS R, BABAEIZADEH M, KINDERMANS P J, et al. Phenaki: variable length video generation from open domain textual descriptions[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2022. [137] HU Y, LUO C, CHEN Z. Make it move: controllable image-to-video generation with text descriptions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 18219-18228. [138] HO J, CHAN W, SAHARIA C, et al. Imagen video: high definition video generation with diffusion models[J]. arXiv:2210.02303, 2022. [139] SINGER U, POLYAK A, HAYES T, et al. Make-A-Video: text-to-video generation without text-video data[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2022. [140] ZHOU D, WANG W, YAN H, et al. MagicVideo: efficient video generation with latent diffusion models[J]. arXiv:2211. 11018, 2022. [141] HE Y, YANG T, ZHANG Y, et al. Latent video diffusion models for high-fidelity video generation with arbitrary lengths[J]. arXiv:2211.13221, 2022. [142] BLATTMANN A, ROMBACH R, LING H, et al. Align your latents: high-resolution video synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 22563-22575. [143] WANG Y, CHEN X, MA X, et al. LAVIE: high-quality video generation with cascaded latent diffusion models[J]. arXiv:2309.15103, 2023. [144] ZHANG D J, WU J Z, LIU J W, et al. Show-1: marrying pixel and latent diffusion models for text-to-video genera-tion[J]. arXiv:2309.15818, 2023. [145] LI X, CHU W, WU Y, et al. VideoGEN: a reference-guided latent diffusion approach for high definition text-to-video generation[J]. arXiv:2309.00398, 2023. [146] LUO Z, CHEN D, ZHANG Y, et al. VideoFusion: decom-posed diffusion models for high-quality video generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10209-10218. [147] LIU Y, ZHANG K, LI Y, et al. Sora: a review on back-ground, technology, limitations, and opportunities of large vision models[J]. arXiv:2402.17177, 2024. [148] YELLAPRAGADA S, GRAIKOS A, PRASANNA P, et al. PathLDM: text conditioned latent diffusion model for histopathology[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024: 5182-5191. [149] SUN L, PENG W. MedSyn: text-guided anatomy-aware synthesis of high-fidelity 3D CT images[J]. arXiv:2310.03559v6, 2024. [150] HAMAMCI I E, ER S, SIMSAR E, et al. GenerateCT: text-guided 3D chest CT generation[J]. arXiv:2305.16037, 2023. [151] 汤健, 郭海涛, 夏恒, 等. 面向工业过程的图像生成及其应用研究综述[J]. 自动化学报, 2024, 50(2): 211-240. TANG J, GUO H T, XIA H, et al. Image generation and its application research for industrial process: a survey[J]. Acta Automatica Sinica, 2024, 50(2): 211-240. [152] 赵冠婕. 生成模型在园林设计中的应用研究[D]. 太原: 山西大学, 2023. ZHAO G J. Research on application of generative model in landscape design[D]. Taiyuan: Shanxi Univesity, 2023. [153] 赵宏, 李文改. 基于扩散生成对抗网络的文本生成图像模型研究[J]. 电子与信息学报, 2023, 45(12): 4371-4381. ZHAO H, LI W G. Text-to-image generation model based on diffusion wasserstein generative adversarial networks[J]. Journal of Electronics and Information Technology, 2023, 45(12): 4371-4381. [154] GU A, DAO T. Mamba: linear-time sequence modeling with selective state spaces[J]. arXiv:2312.00752, 2023. [155] HU M, ZHENG C, CHAM T J, et al. UniD3: unified discrete diffusion for simultaneous vision-language generation[J]. arXiv:2211.14842, 2022. [156] BAO F, NIE S, XUE K, et al. One transformer fits all distributions in multi-modal diffusion at scale[C]//Proceedings of the International Conference on Machine Learning, 2023: 1692-1717. |
| [1] | ZENG Fanzhi, WU Chutao, ZHOU Yan. Cross-Domain Face in Vivo Detection of Unilateral Adversarial Network Algorithm [J]. Computer Engineering and Applications, 2024, 60(5): 103-111. |
| [2] | ZHOU Yuechuan, ZHANG Jianxun, DONG Wenxin, GAO Linfeng, NI Jinyuan. Unsupervised Landscape Painting Style Transfer Network with Multiscale Semantic Information [J]. Computer Engineering and Applications, 2024, 60(4): 258-269. |
| [3] | LIANG Hong, CHEN Qiushi, SHAO Mingwen. Controllable Face Image Synthesis Algorithm Based on Attribute Decomposition and Fusion [J]. Computer Engineering and Applications, 2023, 59(4): 208-215. |
| [4] | LI Hao, ZHAO Guangzhe. DRUSR:Effect-Oriented Super-Resolution Reconstruction of Images [J]. Computer Engineering and Applications, 2023, 59(24): 165-175. |
| [5] | ZHANG Haiyan, ZHANG Fukai, YUAN Guan, LI Yingying. Research on Person Re-Identification Algorithm Based on Multi-Pose Image Generation [J]. Computer Engineering and Applications, 2023, 59(2): 143-152. |
| [6] | LIU Yan, QIU Tiantian, XIAO Yanqiu, ZHU Fubao, WANG Jingwen. Lane Detection Algorithm Based on Introduction of Improved GAN Network in Night Vision Environment [J]. Computer Engineering and Applications, 2023, 59(15): 214-222. |
| [7] | NIE Xiongfeng, WANG Junying, DONG Fangmin, ZANG Zhaoxiang, JIANG Shu. Multimodal Animation Style Transfer Method Fused with Attention Mechanism [J]. Computer Engineering and Applications, 2023, 59(15): 223-234. |
| [8] | WANG Bowei, DENG Jun, LYU Bin. Hotspot Forecast of Taxi Demand Based on WCGAN [J]. Computer Engineering and Applications, 2023, 59(12): 293-300. |
| [9] | GAO Wenchao, REN Shengbo, TIAN Chi, ZHAO Shanshan. Research on Method of Animated Avatar Generation Based on Multi-Level Generative Adversarial Networks [J]. Computer Engineering and Applications, 2022, 58(9): 230-237. |
| [10] | WEI Chengfeng, DONG Hongwei, XU Xiaochun. Attribute Editable Person Image Synthesis Based on Spatial Transformation [J]. Computer Engineering and Applications, 2022, 58(6): 219-226. |
| [11] | LI Peiyang, LI Xuan, CHEN Junjie, CHEN Yongle. Adversarial Sample Generation for Evading Botnet Traffic Detection [J]. Computer Engineering and Applications, 2022, 58(4): 126-133. |
| [12] | JU Sibo, XU Jing, LI Yanfang. Text-to-Single Image Method Based on Self-Attention [J]. Computer Engineering and Applications, 2022, 58(3): 249-258. |
| [13] | DENG Bo, HE Chunlin, XU Liming, SONG Lanyu. Text-to-Image Synthesis: Survey of State-of-the-Art [J]. Computer Engineering and Applications, 2022, 58(23): 42-55. |
| [14] | CHEN Yuefurong, LI Yi. Virtual Try-On Method Introducing Differential Constraints and Adversarial Training Strategies [J]. Computer Engineering and Applications, 2022, 58(21): 286-293. |
| [15] | MI Aizhong, ZHANG Wei, QIAO Yingxu, XU Chengjing, HUO Zhanqiang. Review of Research on Facial Makeup Transfer [J]. Computer Engineering and Applications, 2022, 58(2): 15-26. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||