计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (15): 223-234.DOI: 10.3778/j.issn.1002-8331.2204-0338

• 图形图像处理 • 上一篇    下一篇

融合注意力机制的多模态动漫风格迁移方法

聂雄锋,王俊英,董方敏,臧兆祥,江曙   

  1. 1.三峡大学 计算机与信息学院,湖北 宜昌 443002
    2.三峡大学 湖北省建筑质量检测装备工程技术研究中心,湖北 宜昌 443002  
    3.三峡大学 水电工程智能视觉监测湖北省重点实验室,湖北 宜昌 443002
  • 出版日期:2023-08-01 发布日期:2023-08-01

Multimodal Animation Style Transfer Method Fused with Attention Mechanism

NIE Xiongfeng, WANG Junying, DONG Fangmin, ZANG Zhaoxiang, JIANG Shu   

  1. 1.College of Computer and Information Technology, China Three Gorges University, Yichang, Hubei 443002, China
    2.Hubei Construction Quality Inspection Equipment Engineering Technology Research Center, China Three Gorges University, Yichang, Hubei 443002, China
    3.Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang, Hubei 443002, China
  • Online:2023-08-01 Published:2023-08-01

摘要: 由于没有与图像的内容结构相匹配,目前的一些方法在针对具有复杂语义信息和显著性特征的图像的动漫风格迁移时,生成图像存在风格色彩不丰富、伪影、部分内容细节信息丢失等现象,提出一种融合注意力机制的多模态动漫风格迁移方法MastGAN-CBAM,将动漫图像特征聚类成若干子特征分量,并利用GraphCut算法使得这些特征分量和各局部内容图像特征相匹配,再利用Gram矩阵计算这些特征的风格损失,从而构造了一种多模态风格损失函数,由于这种风格损失适应了图像的多模态特征,因此能更有效地对网络参数进行优化和调整,此外方法还引入了混合域注意力机制,提高了模型的效率和准确性,进一步提升了动漫风格迁移效果。实验结果表明,该方法的生成图像细节更完整,动漫风格更显著,且减少了伪影,动漫化效果有一定程度的提高,在《千与千寻》等三组动漫数据集实验中FID评价指标分别达到了164.89、162.02、199.37,在视频动漫风格迁移中也取得了较好的效果。

关键词: 深度学习, 动漫风格迁移, 生成对抗网络, 多模态匹配, 注意力机制

Abstract: Due to the lack of matching with the content structure of the image, when some current methods transfer the animation style of the image with complex semantic information and salient features, the generated image has the phenomena of insufficient style color, artifact, loss of some content details, etc. This paper proposes a multi-modal animation style transfer method fused with attention mechanism, mastgan CBAM, which clusters the animation image features into several sub feature components, The graphcut algorithm is used to match these feature components with the local content image features, and then the Gram matrix is used to calculate the style loss of these features, so a multimodal style loss function is constructed. Because this style loss adapts to the multimodal features of the image, the network parameters can be optimized and adjusted more effectively. In addition, the method also introduces a hybrid domain attention mechanism, It improves the efficiency and accuracy of the model, and further improves the effect of animation style migration. The experimental results show that the image details generated by this method are more complete, the animation style is more significant, and the artifact is reduced, and the animation effect is improved to a certain extent. In the experiments of three groups of animation data sets such as “Chihiro”, the FID evaluation indicators have reached 164.89, 162.02 and 199.37 respectively, and good results have been achieved in the style transfer of video animation.

Key words: deep learning, animation style transfer, generative adversarial networks;multimodal matching;attention mechanism