Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (12): 208-216.DOI: 10.3778/j.issn.1002-8331.2203-0326

• Graphics and Image Processing • Previous Articles     Next Articles

Text-to-Image Method Based on Attention Model with Increased Gate Mechanism

CHEN Jize, JIANG Xiaoyan, GAO Yongbin   

  1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
  • Online:2023-06-15 Published:2023-06-15

基于门机制注意力模型的文本生成图像方法

陈积泽,姜晓燕,高永彬   

  1. 上海工程技术大学 电子电气工程学院,上海 201620

Abstract: Aiming at the problems in text-to-image such as single local texture, unclear edge details and non-conformity to the input text description, RAGAN is proposed as a text-to-image method based on attention model with increased gate mechanism. To address the problem that the traditional method cannot generate fine-grained images, an attention model network with an added gate mechanism is used to filter out relevant word vectors and combine them with intermediate hidden vectors to form new hidden vectors, and then the mutual game of the generative adversarial network allows the generator to generate images with richer textures and clearer edges of the target objects, thus improving the image quality. To address the problem that the generated images do not match the input text description, text reconstruction is used to extract the deep semantic features embedded in the generated images and compare them with the semantic features of the input text to improve the semantic consistency by defining the reconstruction loss. Compared to the baseline model, the Inception Score and R-precision on the CUB dataset improved by 9.17% and 8.3% respectively, and the Inception Score and R-precision on the COCO dataset improved by 13.67% and 5.56% respectively, demonstrating that the model in this paper is effective in improving the authenticity and artistry of the generated images while maintaining semantic consistency.

Key words: attention mechanism network, convolutional neural network, generative adversarial network, deep learning

摘要: 针对传统文本生成图像方法存在生成图像局部纹理单一、边缘细节不清晰和不符合输入文本描述等问题,提出一种门机制注意力模型的文本生成图像方法RAGAN。针对传统方法无法生成细粒度图像的问题,使用增加门机制的注意力模型网络筛选出相关的词向量,并与中间隐藏向量相结合形成新的隐藏向量,再通过生成对抗网络的相互博弈让生成器生成纹理更加丰富、目标物体边缘更加清晰的图像,从而提高图像质量;针对生成图像不符合输入文本描述的问题,使用文本重构提取生成图像中蕴含的深层次的语义特征,与输入文本的语义特征进行对比,通过定义重构损失提高语义一致性。相比于基准模型,在CUB数据集上的Inception Score与R-precision分别提高了9.17%和8.3%,在COCO数据集上的Inception Score与R-precision分别提高了13.67%和5.56%,证明了该模型在保持语义一致性的同时,有效提高了生成图像的真实性和艺术性。

关键词: 注意力机制, 卷积神经网络, 生成对抗网络, 深度学习