[1] WANG S, SAHARIA C, MONTGOMERY C, et al. Imagen editor and EditBench: advancing and evaluating text-guided image inpainting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 18359-18369.
[2] LI C, ZHANG C, WAGHWASE A, et al. Generative AI meets 3D: a survey on text-to-3D in AIGC era[J]. arXiv:2305.06131, 2023.
[3] KO H K, PARK G, JEON H, et al. Large-scale text-to-image generation models for visual artists’ creative works[C]//Proceedings of the 28th International Conference on Intelligent User Interfaces. New York: ACM, 2023: 919-933.
[4] 曹寅, 秦俊平, 高彤, 等. 基于生成对抗网络的文本两阶段生成高质量图像方法[J]. 浙江大学学报 (工学版), 2024, 58(4): 674-683.
CAO Y, QIN J P, GAO T, et al. Method for generating high-quality images by two-stage text based on generation countermeasure network[J]. Journal of Zhejiang University (Engineering Science), 2024, 58(4): 674-683.
[5] YANG B, XIANG X Q, KONG W Z, et al. DMF-GAN: deep multimodal fusion generative adversarial networks for text-to-image synthesis[J]. IEEE Transactions on Multimedia, 2024, 26: 6956-6967.
[6] QIAO T T, ZHANG J, XU D Q, et al. MirrorGAN: learning text-to-image generation by redescription[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 1505-1514.
[7] XU T, ZHANG P C, HUANG Q Y, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks[J]. arXiv:1711.10485, 2017.
[8] ZHU M F, PAN P B, CHEN W, et al. DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5795-5803.
[9] KANG M, ZHU J Y, ZHANG R, et al. Scaling up GANs for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10124-10134.
[10] TAO M, BAO B K, TANG H, et al. Galip: generative adversarial clips for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 14214-14223.
[11] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2672-2680.
[12] ZHANG H, XU T, LI H, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5908-5916.
[13] TAO M, TANG H, WU F, et al. DF-GAN: a simple and effective baseline for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 16494-16504.
[14] YE S, WANG H, TAN M, et al. Recurrent affine transformation for text-to-image synthesis[J]. IEEE Transactions on Multimedia, 2023, 26: 462-473.
[15] LIU A A, SUN Z, XU N, et al. Prior knowledge guided text to image generation[J]. Pattern Recognition Letters, 2024, 177: 89-95.
[16] REED S, AKATA Z, YAN X, et al. Generative adversarial text to image synthesis[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning, 2016: 1060-1069.
[17] 陈积泽, 姜晓燕, 高永彬. 基于门机制注意力模型的文本生成图像方法[J]. 计算机工程与应用, 2023, 59(12): 208-216.
CHEN J Z, JIANG X Y, GAO Y B. Text-to-image method based on attention model with increased gate mechanism[J]. Computer Engineering and Applications, 2023, 59(12): 208-216.
[18] TAN H, YIN B, WEI K, et al. ALR-GAN: adaptive layout refinement for text-to-image synthesis[J]. IEEE Transactions on Multimedia, 2023, 25: 8620-8631.
[19] LIAO W, HU K, YANG M Y, et al. Text to image generation with semantic-spatial aware GAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 18166-18175.
[20] SHEYNIN S, ASHUAL O, POLYAK A, et al. KNN-Diffusion: image generation via large-scale retrieval[J]. arXiv: 2204.02849, 2022.
[21] XUE Z, SONG G, GUO Q, et al. RAPHAEL: text-to-image generation via large mixture of diffusion paths[C]//Advances in Neural Information Processing Systems, 2023: 41693-41706.
[22] ZHAO S, CHEN D, CHEN Y C, et al. Uni-ControlNET: all-in-one control to text-to-image diffusion models[C]//Advances in Neural Information Processing Systems, 2023: 11127-11150.
[23] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
[24] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115: 211-252.
[25] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2818-2826.
[26] WAH C, BRANSON S, WELINDER P, et al. The caltech-UCSD birds-200-2011 dataset[EB/OL].(2022-08-12)[2024-05-26]. https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf.
[27] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision. Berlin: Springer, 2014: 740-755.
[28] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6629-6640.
[29] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 2234-2242.
[30] ZHANG Z, SCHOMAKER L. DTGAN: dual attention generative adversarial networks for text-to-image generation[C]//Proceedings of the International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8.
[31] DING M, ZHENG W, HONG W, et al. Cogview2: faster and better text-to-image generation via hierarchical transformers[C]//Advances in Neural Information Processing Systems, 2022: 16890-16902. |