计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (12): 222-231.DOI: 10.3778/j.issn.1002-8331.2405-0306

• 图形图像处理 • 上一篇    下一篇

MCFA-UNet:结合多尺度融合与注意力机制的图像生成网络

王铁君,张泽宇,郭晓然,武娇   

  1. 1.西北民族大学 数学与计算机科学学院,兰州 730030
    2.西北民族大学 中国民族信息技术研究院,兰州 730030
  • 出版日期:2025-06-15 发布日期:2025-06-13

MCFA-UNet: Combining Multiscale Fusion and Attention for Image Generation Networks

WANG Tiejun, ZHANG Zeyu, GUO Xiaoran, WU Jiao   

  1. 1.School of Mathematics and Computer Science, Northwest Minzu University, Lanzhou 730030, China
    2.China National Information Technology Research Institute, Northwest Minzu University, Lanzhou 730030, China
  • Online:2025-06-15 Published:2025-06-13

摘要: 在图像生成领域,基于去噪扩散概率模型(DDPM)的深度学习方法已经取得了显著的进展。然而,在处理复杂纹理和细节丰富的图像时,现有模型生成的图像会出现模糊、纹理细节不清晰等问题,主要原因是原始DDPM采用的UNet网络在捕捉高度细节化图像特征时存在一定局限性。为解决这一问题,提出了一种基于多尺度卷积和融合注意力机制的新型UNet网络,命名为MCFA-UNet。该网络通过在编码器和解码器中引入残差块和线性注意力多尺度卷积模块,并在跳跃连接中加入多尺度融合注意力组件,提升了对图像细节的捕捉能力及生成图像的整体质量。实验结果显示,在唐卡数据集、Cifar10和ImageNet-64公共数据集上,采用MCFA-UNet的DDPM模型优于原始的DDPM模型,得到了更低的FID值和更高的主观评价得分,证明了其改进效果的显著性。

关键词: 图像生成, 去噪扩散概率模型(DDPM), UNet网络, AIGC方法

Abstract: In the field of image generation, deep learning methods based on denoising diffusion probabilistic models (DDPM) have made significant progress. However, when dealing with complex textures and detail-rich images, the images generated by existing models suffer from blurring and unclear texture details, mainly due to the limitations of the UNet network used in the original DDPM in capturing highly detailed image features. To solve this problem, a novel UNet network based on multi-scale convolution and fused attention mechanism, named MCFA-UNet, is proposed. This network improves the ability to capture image details and the overall quality of the generated images by introducing residual block and linear attention multi-scale convolution modules in the encoder and decoder, and adding a multi-scale fused attention component in the jump connection. The experimental results show that the DDPM model with MCFA-UNet outperforms the original DDPM model on the Tangka dataset, Cifar10 and ImageNet-64 public datasets, obtaining lower FID values and higher subjective evaluation scores, which proves the significance of the improvement.

Key words: image generation, denoising diffusion probabilistic models (DDPM), UNet network, artificial intelligence generated content (AIGC)