Multimodal Animation Style Transfer Method Fused with Attention Mechanism

doi:10.3778/j.issn.1002-8331.2204-0338

Abstract

Abstract: Due to the lack of matching with the content structure of the image, when some current methods transfer the animation style of the image with complex semantic information and salient features, the generated image has the phenomena of insufficient style color, artifact, loss of some content details, etc. This paper proposes a multi-modal animation style transfer method fused with attention mechanism, mastgan CBAM, which clusters the animation image features into several sub feature components, The graphcut algorithm is used to match these feature components with the local content image features, and then the Gram matrix is used to calculate the style loss of these features, so a multimodal style loss function is constructed. Because this style loss adapts to the multimodal features of the image, the network parameters can be optimized and adjusted more effectively. In addition, the method also introduces a hybrid domain attention mechanism, It improves the efficiency and accuracy of the model, and further improves the effect of animation style migration. The experimental results show that the image details generated by this method are more complete, the animation style is more significant, and the artifact is reduced, and the animation effect is improved to a certain extent. In the experiments of three groups of animation data sets such as “Chihiro”, the FID evaluation indicators have reached 164.89, 162.02 and 199.37 respectively, and good results have been achieved in the style transfer of video animation.

Key words: deep learning, animation style transfer, generative adversarial networks;multimodal matching;attention mechanism

摘要： 由于没有与图像的内容结构相匹配，目前的一些方法在针对具有复杂语义信息和显著性特征的图像的动漫风格迁移时，生成图像存在风格色彩不丰富、伪影、部分内容细节信息丢失等现象，提出一种融合注意力机制的多模态动漫风格迁移方法MastGAN-CBAM，将动漫图像特征聚类成若干子特征分量，并利用GraphCut算法使得这些特征分量和各局部内容图像特征相匹配，再利用Gram矩阵计算这些特征的风格损失，从而构造了一种多模态风格损失函数，由于这种风格损失适应了图像的多模态特征，因此能更有效地对网络参数进行优化和调整，此外方法还引入了混合域注意力机制，提高了模型的效率和准确性，进一步提升了动漫风格迁移效果。实验结果表明，该方法的生成图像细节更完整，动漫风格更显著，且减少了伪影，动漫化效果有一定程度的提高，在《千与千寻》等三组动漫数据集实验中FID评价指标分别达到了164.89、162.02、199.37，在视频动漫风格迁移中也取得了较好的效果。

关键词: 深度学习, 动漫风格迁移, 生成对抗网络, 多模态匹配, 注意力机制

NIE Xiongfeng, WANG Junying, DONG Fangmin, ZANG Zhaoxiang, JIANG Shu. Multimodal Animation Style Transfer Method Fused with Attention Mechanism[J]. Computer Engineering and Applications, 2023, 59(15): 223-234.

聂雄锋, 王俊英, 董方敏, 臧兆祥, 江曙. 融合注意力机制的多模态动漫风格迁移方法[J]. 计算机工程与应用, 2023, 59(15): 223-234.

References

[1] GATYS L A，ECKER A S，BETHGE M.Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：2414-2423.
[2] JOHNSON J，ALAHI A，LI F F.Perceptual losses for real-time style transfer and super-resolution[C]//European Conference on Computer Vision.Cham：Springer，2016：694-711.
[3] DUMOULIN V，SHLENS J，KUDLUR M.A learned representation for artistic style[J].arXiv：1610.07629，2016.
[4] LI Y，FANG C，YANG J，et al.Universal style transfer via feature transforms[C]//Advances in Neural Information Processing Systems，2017.
[5] ZHANG Y，FANG C，WANG Y，et al.Multimodal style transfer via graph cuts[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：5943-5951.
[6] BOYKOV Y Y，JOLLY M P.Interactive graph cuts for optimal boundary & region segmentation of objects in ND images[C]//Proceedings Eighth IEEE International Conference on Computer Vision，2001：105-112.
[7] GOODFELLOW I，POUGET-ABADIE J，MIRZA M，et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems，2014.
[8] ZHU J Y，PARK T，ISOLA P，et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2223-2232.
[9] 魏富强，古兰拜尔·吐尔洪，买日旦·吾守尔.生成对抗网络及其应用研究综述[J].计算机工程与应用，2021，57（19）：18-31.
WEI F Q，Gulanbaier Tuerhong，Mairidan Wushouer.Review of research on generative adversarial networks and its application[J].Computer Engineering and Application，2021，57（19）：18-31.
[10] CHEN Y，LAI Y K，LIU Y J.Cartoongan：generative adversarialnetworks for photo cartoonization[C]//Prozand Pattern Recognition，2018：9465-9474.
[11] CHEN J，LIU G，CHEN X.AnimeGAN：a novel lightweight GAN for photo animation[C]//International Symposium on Intelligence Computation and Applications.Singapore：Springer，2019：242-256.
[12] WANG X，YU J.Learning to cartoonize using white-box cartoon representations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：8090-8099.
[13] 孙天鹏，周宁宁，黄国方.新的基于GAN的局部写实感漫画图像风格迁移[J].计算机工程与应用，2022，58（14）：167-176.
SUN T P，ZHOU N N，HUANG G F.New GAN-based partial realistic anime image style transfer[J].Computer Engineering and Applications，2022，58（14）：167-176.
[14] 王一凡，赵乐义，李毅.基于生成对抗网络的图像动漫风格化[J].计算机工程与应用，2022，58（18）：104-110.
WANG Y F，ZHAO L Y，LI Y.Image animation stylization based on generative adversarial network[J].Computer Engineering and Applications，2022，58（18）：104-110.
[15] HICSONMEZ S，SAMET N，AKBAS E，et al.GANILLA：generative adversarial networks for image to illustration translation[J].Image and Vision Computing，2020，95：103886.
[16] MAO X，LI Q，XIE H，et al.Least squares generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2794-2802.
[17] VAN DER MAATEN L，HINTON G.Visualizing data using t-SNE[J].Journal of Machine Learning Research，2008，9（11）.
[18] BOYKOV Y，VEKSLER O，ZABIH R.Fast approximate energy minimization via graph cuts[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2001，23（11）：1222-1239.
[19] JADERBERG M，SIMONYAN K，ZISSERMAN A.Spatial transformer networks[C]//Advances in Neural Information Processing Systems，2015.
[20] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[21] WOO S，PARK J，LEE J Y，et al.Cbam：convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：3-19.
[22] ZHANG Q L，YANG Y B.Sa-net：shuffle attention for deep convolutional neural networks[C]//2021 IEEE International Conference on Acoustics，Speech and Signal Processing（ICASSP），2021：2235-2239.
[23] HEUSEL M，RAMSAUER H，UNTERTHINER T，et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[C]//Advances in Neural Information Processing Systems，2017.
[24] BARRATT S，SHARMA R.A note on the inception score[J].arXiv：1801.01973，2018.