计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (19): 14-36.DOI: 10.3778/j.issn.1002-8331.2205-0119

• 热点与综述 • 上一篇    下一篇

生成对抗网络及其文本图像合成综述

王威,李玉洁,郭富林,刘岩,何俊霖   

  1. 1.桂林电子科技大学 人工智能学院,广西 桂林  541000 
    2.郑州轻工业大学 计算机与通信工程学院,郑州  450002
  • 出版日期:2022-10-01 发布日期:2022-10-01

Survey About Generative Adversarial Network Based Text-to-Image Synthesis

WANG Wei, LI Yujie, GUO Fulin, LIU Yan, HE Junlin   

  1. 1.School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin, Guangxi 541000, China
    2.School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China
  • Online:2022-10-01 Published:2022-10-01

摘要: 随着深度学习的快速发展,基于生成对抗网络的文本图像合成领域成为了当下计算机视觉研究的热点。生成对抗网络同时包含生成器和鉴别器,通过两者的博弈来实现逼真数据的生成。受生成对抗网络的启发,近几年提出了一系列的文本图像合成模型,从图像质量、多样性、语义一致性方面不断取得突破。为推动文本图像合成领域的研究发展,对现有文本图像合成技术进行了全面概述。从文本编码、文本直接合成图像、文本引导图像合成方面对文本图像合成模型进行了分类整理,并详细探讨了各类基于生成对抗网络的代表性模型的模型框架和关键性贡献。分析了现有的评估指标和常用的数据集,提出了现有方法在复杂场景和文本、多模态、轻量化模型、模型评价方法等方面的不足和未来的发展趋势。总结了目前生成对抗网络在各领域的发展,重点关注了在文本图像合成领域的应用,可以作为一个研究人员进行图像合成研究时选择深度学习相关方法的权衡和参考。

关键词: 文本图像合成, 生成对抗网络, 文本编码, 深度学习

Abstract: With the rapid development of deep learning, the field of text image synthesis based on generative adversarial network has become a hot spot for computer vision research nowadays. Generative adversarial network consists of two neural networks, the generator and the discriminator, which compete against each other. Inspired by generative adversarial network, a series of text-to-image synthesis models have been proposed in recent years, and breakthroughs have been made in terms of image quality, diversity, and semantic consistency. A comprehensive overview of existing text-to-image synthesis techniques is presented to promote research development in the text-to-image synthesis field. The text-to-image synthesis models are categorized in terms of text encoding, text-direct image synthesis, and text-guided image synthesis. The model framework and key contributions of various representative generative adversarial network based models are discussed. The existing evaluation metrics and commonly used datasets are analyzed, and the deficiencies and future trends of existing methods in complex scenes and texts, multimodality, lightweight models, model evaluation methods, etc. are presented. It completes the current development of generative adversarial network in various fields, focusing on applications in the text-to-image synthesis field. The analysis provides a guide for researchers to measure and apply the deep learning based text image synthesis methods.

Key words: text-to-image synthesis, generative adversarial network, text encoding, deep learning