计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (8): 204-214.DOI: 10.3778/j.issn.1002-8331.2311-0047

• 图形图像处理 • 上一篇    下一篇

融合多注意力机制的语义调整风格迁移网络

张彩灯,徐杨,莫寒,冯明文   

  1. 1.贵州大学 大数据与信息工程学院,贵阳 550025
    2.贵阳铝镁设计研究院有限公司,贵阳 550009
  • 出版日期:2025-04-15 发布日期:2025-04-15

Semantic Adjusted Style Transfer Network with Multi-Attention Mechanisms

ZHANG Caideng, XU Yang, MO Han, FENG Mingwen   

  1. 1.College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
    2.Guiyang Aluminum-Magnesium Design and Research Institute Co., Ltd., Guiyang 550009, China
  • Online:2025-04-15 Published:2025-04-15

摘要: 风格迁移是一种计算机视觉技术,旨在将一幅图像的风格迁移到另一幅图像上,从而创造出拥有新风格的图像。但当前任意风格迁移网络中还存在一些问题,如融合后的风格化图像语义不清晰以及整体风格不一致等。为了解决这些问题,提出了一种新的多注意力风格迁移网络MatST。该网络结合了语义调整的方法,通过引入一系列注意力机制来改进风格迁移的效果。提出了RCCAB模块,通过结合交叉卷积和通道注意力机制,解决图像定位和细节表示的问题。结合窗口自注意力、重叠交叉窗口注意力OCAB和多头注意力块MHAB,设计了多注意力模块MAB作为Transformer编码器的子层。MAB模块从多个维度提取图像特征,解决图像网格化和风格化不细致的问题。设计了风格化图像语义调整器,通过反馈传播的方式来调整风格化图像的语义信息,生成语义清晰且更符合人眼感知的风格化图像。实验结果表明,相对于StyTr2网络,MatST网络在COCO数据集上内容损失降低0.172?5,同时风格损失减少0.075?7。经实验验证,该网络在获得较好风格化图像的同时,能够保留清晰的内容语义,具有良好的任意风格迁移效果。

关键词: 风格迁移, 语义调整, 注意力机制, 交叉卷积

Abstract: Style transfer is a computer vision technique aimed at transferring the style of one image to another, thereby creating an image with a new style. However, current arbitrary style transfer networks still suffer from some issues, such as unclear semantic fusion in stylized images and overall inconsistency in style. To address these problems, this paper proposes a new multi-attention style transfer network, MatST. This network integrates semantic adjustment methods and improves style transfer by introducing a series of attention mechanisms. Firstly, the RCCAB module is proposed to address the issues of image localization and detail representation by combining cross convolution and channel attention mechanisms. Secondly, by combining window self-attention, overlapping cross-window attention (OCAB), and multi-head attention block (MHAB), a multi-attention module (MAB) is designed as a sub-layer of the Transformer encoder. The MAB module extracts image features from multiple dimensions, addressing the problem of image gridization and coarse stylization. Finally, a stylized image semantic adjuster is designed to adjust the semantic information of stylized images through feedback propagation, generating stylized images with clear semantics and closer alignment with human perception. Experimental results show that compared to the StyTr2 network, the MatST network reduces content loss by 0.172?5 and style loss by 0.075?7 on the COCO dataset. Through experimental validation, the network in this paper retains clear semantic content while achieving good arbitrary style transfer effects, resulting in high-quality stylized images.

Key words: style transfer, semantic adjustment, attention mechanism, cross-convolution