Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (23): 42-55.DOI: 10.3778/j.issn.1002-8331.2204-0441

• Research Hotspots and Reviews • Previous Articles     Next Articles

Text-to-Image Synthesis: Survey of State-of-the-Art

DENG Bo, HE Chunlin, XU Liming, SONG Lanyu   

  1. School of Computer Science, China West Normal University, Nanchong, Sichuan 637009,China
  • Online:2022-12-01 Published:2022-12-01



  1. 西华师范大学 计算机学院,四川 南充 637009

Abstract: Generative adversarial network is an important method of image synthesis, and the most commonly used method for text to image synthesis. With the deepening of cross-modal generation research, the realism and semantic relevance of text to images have been greatly improved. Good results have been achieved in the synthesis of natural images such as flowers, birds and human faces, as well as in the synthesis of scene graph and layouts. Meanwhile, there are challenges: it is hard to generate multiple objects in a complex scene, and new methods of text to image synthesis cannot be accurately evaluated, new metrics need to be proposed. This paper reviews the development of state-of-the-art text to image methods, and lists methods, datasets and evaluation metrics proposed in recent years. Finally, the existing problems about dataset, metrics, method and application are discussed, and the future research direction is prospected.

Key words: image synthesis, generative adversarial networks, text to image

摘要: 生成对抗网络是图像合成的重要方法,也是目前实现文字生成图像任务最多的手段。随着跨模态生成研究不断地深入,文字生成图像的真实度与语义相关性得到了巨大提升,无论是生成花卉、鸟类、人脸等自然图像,还是生成场景图和布局,都取得了较好的成果。同时,文字生成图像技术也存在面临着一些挑战,如难以生成复杂场景中的多个物体,以及现有的评估指标不能准确地评估新提出的文字生成图像算法,需要提出新的算法评价指标。回顾了文字生成图像方法自提出以来的发展状况,列举了近年提出的文字生成图像算法、常用数据集和评估指标。最后从数据集、指标、算法和应用方面探讨了目前存在的问题,并展望了今后的研究方向。

关键词: 图像合成, 生成对抗网络, 文字生成图像