Research on Method of Animated Avatar Generation Based on Multi-Level Generative Adversarial Networks

doi:10.3778/j.issn.1002-8331.2010-0139

Abstract

Abstract: The existing animation image generation methods have the problems of lack of diversity in synthetic images, unclear local textures, small sample variance, and difficulty in generating according to detailed descriptions. Based on the idea of ??StackGAN++, combined with auxiliary classifiers, this paper proposes an improved model ACM-GAN（auxiliary classification atteched multi-level generative adversial networks, a multi-level structure with auxiliary classifiers） for Anime character avatar generation. The network model is composed of two generators and two discriminators stacked, and the auxiliary classifier is used to constrain the generated results to increase the variance of the generated samples and increase the diversity of the generated samples. To ensure that the synthesized image is true, the loss of feature map space and the loss of image pixel space mean variance are introduced to minimize the distance between synthetic data and real data. The experimental results show that the multi-level structure can effectively stabilize the training process, increase the edge details and local texture of the image, and at the same time auxiliary classification effectively solve the pattern collapse problem, and improve the accuracy of generating images of the specified category. The FID score of the iamge generated by ACM-GAN reaches 27.96, which is an increase of 23.1% compared StackGAN++.

Key words: anime avatar generation, generative adversarial networks, multi-level structure, auxiliary classifier

摘要： 现有的动画图像生成方法存在合成图像多样性缺失、局部纹理不清晰、样本方差较小，难以根据细节描述进行生成的问题。基于堆叠式生成对抗网络（StackGAN++）的思想，结合辅助分类器，提出改进模型ACM-GAN（auxiliary classification atteched multi-level generative adversial networks，带有辅助分类器的多层次结构生成对抗网络）用于动画人物头像生成。该网络模型由两个生成器和两个判别器堆叠而成，并在判别器中嵌入辅助分类器对生成结果进行约束，使生成样本方差变大，增加生成样本的多样性。为保证合成图像真实度和清晰度，引入特征图空间损失和图像像素空间均值方差损失以最小化合成数据和真实数据的距离。实验结果表明，多层次结构能够有效稳定训练过程，增加图像的边缘细节和局部纹理，同时辅助分类器有效解决模式崩溃问题，提高生成指定类别图像的准确率。ACM-GAN生成图像的FID分数达到27.96，相比于StackGAN++提升23.1%。

关键词: 动画头像生成, 生成对抗网络, 多层次结构, 辅助分类器

GAO Wenchao, REN Shengbo, TIAN Chi, ZHAO Shanshan. Research on Method of Animated Avatar Generation Based on Multi-Level Generative Adversarial Networks[J]. Computer Engineering and Applications, 2022, 58(9): 230-237.

高文超, 任圣博, 田驰, 赵珊珊. 多层次生成对抗网络的动画头像生成方法研究[J]. 计算机工程与应用, 2022, 58(9): 230-237.

References

[1] GOODFELLOW I J，POUGET-ABADIE J，MIRZA M，et al.Generative adversarial nets[C]//Conference on Neural Information Processing Systems.[S.l.]：MIT Press，2014：2672-2680.
[2]　叶晨，关玮.生成式对抗网络的应用综述[J].同济大学学报（自然科学版），2020，48（4）：591-601.
YE C，GUAN W.A review of application of generative adversarial networks[J].Journal of Tongji University（Natural Science），2020，48（4）：591-601.
[3]　刘玉杰，窦长红，赵其鲁.基于条件生成对抗网络的手绘图像检索[J].计算机辅助设计与图形学学报，2017，29（12）：2336-2342.
LIU Y J，DOU C H，ZHAO Q L.Sketch based image retrieval with conditional generative adversarial network[J].Journal of Computer-Aided Design & Computer Graphics，2017，29（12）：2336-2342.
[4] 吴春梅，胡军浩，尹江华.利用改进生成对抗网络进行人体姿态识别.[J].计算机工程与应用，2020，56（8）：96-103.
WU C M，HU J H，YIN J H.Using improved generative adversarial network for human pose estimation[J].Computer Engineering and Applications，2020，56（8）：96-103.
[5] 吴少乾，李西明.生成对抗网络的研究进展综述[J].计算机科学与探索，2020，14（3）：377-388.
WU S Q，LI X M.Survey on research progress of generating adversarial networks[J].Journal of Frontiers of Computer Science and Technology，2020，14（3）：377-388.
[6] MIRZA M，OSINDERO S.Conditional generative adversarial nets[J].arXiv：1411.1784，2014.
[7] ODENA A，OLAH C，SHLENS J.Conditional image synthesis with auxiliary classifier gans[C]//International Conference on Machine Learning，2017：4043-4055.
[8] RADFORD A，METZ L，CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[C]//International Conference on Learning Representations，2016.
[9] METZ L，POOLE B，PFAU D，et al.Unrolled generative adversarial networks[C]//International Conference on Learning Representations（ICLR），Toulon，2017.
[10] ARJOVSKY M，CHINTALA S，BOTTOU L.Wasserstein generative adversarial networks[C]//International Conference on Machine Learning，2017：298-321.
[11] KARRAS T，AILA T，LAINE S，et al.Progressive growing of GANs for improved quality，stability，and variation[C]//International Conference on Learning Representations，2018.
[12] KARNEWAR A，WANG O.MSG-GAN：multi-scale gradients for generative adversarial networks[C]//2020 IEEE CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020.
[13] ZHANG H，XU T，LI H S，et al.StackGAN++：realistic image synthesis with stacked generative adversarial networks[J].arXiv：1710.10916，2017.
[14] SALIMANS T，GOODFELLOW I，ZAREMBA W，et al.Improved techniques for training GANs[C]//Advances in Neural Information Processing Systems，2016：2234-2242.
[15] ZHANG H，XU T，LI H，et al.StackGAN：text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：5907-5915.
[16] DAUPHIN Y N，FAN A，AULI M，et al.Language modeling with gated convolutional networks[C]//Proceedings of the 34th International Conference on Machine Learning，2017：933-941.
[17] NAH S，KIM T H，LEE K M.Deep multi-scale convolutional neural network for dynamic scene deblurring[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2017.
[18] LIM B，SON S，KIM H，et al.Enhanced deep residual networks for single image super-resolution[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops（CVPRW），2017.
[19] 李诚，张羽，黄初华.改进的生成对抗网络图像超分辨率重建[J].计算机工程与应用，2020，56（4）：191-196.
LI C，ZHANG Y，HUANG C H.Improved super-resolution reconstruction of image based on generative adversarial networks[J].Computer Engineering and Applications，2020，56（4）：191-196.
[20] JOHNSON J，ALAHI A，LI F F.Perceptual losses for real-time style transfer and super-resolution[C]//European Conference on Computer Vision（ECCV），2016.
[21] MAO X，LI Q，XIE H，et al.Least squares generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2794-2802.
[22] HEUSEL M，RAMSAUER H，UNTERTHINER T，et al.GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]//Conference on Neural Information Processing Systems.[S.l.]：MIT Press，2017：6627-6638.
[23] SZEGEDY C，VANHOUCKE V，IOFFE S，et al.Rethinking the inception architecture for computer vision[C]//IEEE Conference on Computer Vision and Pattern Recognition，2016.