计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (19): 184-191.DOI: 10.3778/j.issn.1002-8331.2205-0235

• 图形图像处理 • 上一篇    下一篇

结合特征调整与联合自注意力的图像修复

彭豪,李晓明   

  1. 1.太原科技大学 计算机科学与技术学院,太原 030024
    2.太原科技大学 计算机科学与技术学院 计算机重点实验室,太原 030024
  • 出版日期:2023-10-01 发布日期:2023-10-01

Image Inpainting Using Contextual Feature Adjustment and Joint Self-Attentive

PENG Hao, LI Xiaoming   

  1. 1.College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
    2.Key Laboratory of Computer Science, College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
  • Online:2023-10-01 Published:2023-10-01

摘要: 深度学习给图像修复带来了前所未有的进步。然而,由于在特征提取过程中重复使用下采样操作导致上采样特征图以及与相应的自底向上的特征图之间存在一定的空间偏差,因此现有的方法在特征合并之后往往会产生结构失真、纹理模糊的图像修复结果。为了解决上述问题,提出了结合上下文特征调整与联合自注意力的图像修复模型。该模型由两部分组成:上下文特征调整模块和联合自注意力模块。上下文特征调整模块通过调整卷积核中的每个采样位置,学习像素的变换偏移用于在上下文中对齐上采样的特征来减少空间偏差。联合自注意力模块通过在空间和通道维度内部保持比较高的分辨率,并采用了Softmax-Sigmoid联合的非线性函数,能够有效地建模输入和输出特征之间的远距离依赖关系,使得模型能够在图像修复任务上获得更好的性能。将这两个模块整合到一个自上而下的金字塔结构中加强了模型对图像不同尺度特征的利用,形成一个新的图像修复模型。在CelebA、Places2和Paris StreetView等公开可用的数据集上评估了提出的方法。实验结果表明,提出的方法在质量上和数量上都优于目前主流的图像修复技术。

关键词: 深度学习, 图像修复, 注意力机制, 特征调整

Abstract: Deep learning has brought unprecedented advances to image inpainting. However, existing methods often produce structurally distorted, texture-blurred image inpainting results after feature merging because of the spatial deviation between the upsampled feature maps and the corresponding bottom-up feature maps due to the repeated use of downsampling operations in the feature extraction process. To address these problems, the paper proposes an image inpainting model based on contextual feature adjustment and joint attention. The model consists of two parts:the context feature adjustment module and the Joint self-attention module. The context feature adjustment module reduces the spatial deviation by adjusting each sampling position in the convolution kernel and learning the transformation offset of pixels for aligning the up-sampled features in the context. The joint self-attention module can effectively model the long-distance dependence between input and output features by maintaining a relatively high resolution in the space and channel dimensions and adopting the nonlinear function of the Softmax-Sigmoid joint so that the model can achieve better performance in image inpainting tasks. The integration of these two modules into a top-down pyramid structure enhances the model’s use of different scale features of the image and forms a new image inpainting model. The proposed method is evaluated on publicly available datasets such as CelebA, Places2, and Paris StreetView. Experimental results show that the proposed method outperforms the current mainstream image inpainting techniques in terms of quality and quantity.

Key words: deep learning, image inpainting, attention mechanism, feature adjustment