Image Inpainting Using Contextual Feature Adjustment and Joint Self-Attentive

doi:10.3778/j.issn.1002-8331.2205-0235

Abstract

Abstract: Deep learning has brought unprecedented advances to image inpainting. However, existing methods often produce structurally distorted, texture-blurred image inpainting results after feature merging because of the spatial deviation between the upsampled feature maps and the corresponding bottom-up feature maps due to the repeated use of downsampling operations in the feature extraction process. To address these problems, the paper proposes an image inpainting model based on contextual feature adjustment and joint attention. The model consists of two parts：the context feature adjustment module and the Joint self-attention module. The context feature adjustment module reduces the spatial deviation by adjusting each sampling position in the convolution kernel and learning the transformation offset of pixels for aligning the up-sampled features in the context. The joint self-attention module can effectively model the long-distance dependence between input and output features by maintaining a relatively high resolution in the space and channel dimensions and adopting the nonlinear function of the Softmax-Sigmoid joint so that the model can achieve better performance in image inpainting tasks. The integration of these two modules into a top-down pyramid structure enhances the model’s use of different scale features of the image and forms a new image inpainting model. The proposed method is evaluated on publicly available datasets such as CelebA, Places2, and Paris StreetView. Experimental results show that the proposed method outperforms the current mainstream image inpainting techniques in terms of quality and quantity.

Key words: deep learning, image inpainting, attention mechanism, feature adjustment

摘要： 深度学习给图像修复带来了前所未有的进步。然而，由于在特征提取过程中重复使用下采样操作导致上采样特征图以及与相应的自底向上的特征图之间存在一定的空间偏差，因此现有的方法在特征合并之后往往会产生结构失真、纹理模糊的图像修复结果。为了解决上述问题，提出了结合上下文特征调整与联合自注意力的图像修复模型。该模型由两部分组成：上下文特征调整模块和联合自注意力模块。上下文特征调整模块通过调整卷积核中的每个采样位置，学习像素的变换偏移用于在上下文中对齐上采样的特征来减少空间偏差。联合自注意力模块通过在空间和通道维度内部保持比较高的分辨率，并采用了Softmax-Sigmoid联合的非线性函数，能够有效地建模输入和输出特征之间的远距离依赖关系，使得模型能够在图像修复任务上获得更好的性能。将这两个模块整合到一个自上而下的金字塔结构中加强了模型对图像不同尺度特征的利用，形成一个新的图像修复模型。在CelebA、Places2和Paris StreetView等公开可用的数据集上评估了提出的方法。实验结果表明，提出的方法在质量上和数量上都优于目前主流的图像修复技术。

关键词: 深度学习, 图像修复, 注意力机制, 特征调整

PENG Hao, LI Xiaoming. Image Inpainting Using Contextual Feature Adjustment and Joint Self-Attentive[J]. Computer Engineering and Applications, 2023, 59(19): 184-191.

彭豪, 李晓明. 结合特征调整与联合自注意力的图像修复[J]. 计算机工程与应用, 2023, 59(19): 184-191.

References

[1] 赵露露，沈玲，洪日昌.图像修复研究进展综述[J].计算机科学，2021，48（3）：14-26.
ZHAO L L，SHEN L，HONG R C.Survey on image inpainting research progress[J].Computer Science，2021，48（3）：14-26.
[2] SONG L，CAO J，SONG L，et al.Geometry-aware face completion and editing[J].arXiv：1809.02967，2018.
[3] YAN X，WANG F，LIU W，et al.Visualizing the invisible：occluded vehicle segmentation and recovery[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，Seoul，Oct 27-Nov 2，2019.Piscataway：IEEE，2019：7618-7627.
[4] 焦莉娟，王文剑，李秉婧，等.改进的块匹配五台山壁画修复算法[J].计算机辅助设计与图形学学报，2019，31（1）：118-125.
JIAO L J，WANG W J，LI B J，et al.Wutai mountain mural inpainting based on improved block matching algorithm[J].Journal of Computer-Aided Design & Computer，2019，31（1）：118-125.
[5] 曹建芳，张自邦，赵爱迪，等.增强一致性生成对抗网络在壁画修复上的应用[J].计算机辅助设计与图形学学报，2020，32（8）：1315-1323.
CAO J F，ZHANG Z B，ZHAO A D，et al.Application of enhanced consistent generative adversarial network in mural repairing[J].Journal of Computer-Aided Design & Computer，2020，32（8）：1315-1323.
[6] 杨昊，余映.利用通道注意力与分层残差网络的图像修复[J].计算机辅助设计与图形学学报，2021，33（5）：671-681.
YANG H，YU Y.Image inpainting using channel attention and hierarchical residual networks[J].Journal of Computer-Aided Design & Computer，2021，33（5）：671-681.
[7] PATHAK D，KRAHENBUHL P，DONAHUE J，et al.Context encoders：feature learning by inpainting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Las Vegas，Jun 27-30，2016.Piscataway：IEEE，2016：2536-2544.
[8] 曹承瑞，刘微容，史长宏，等.多级注意力传播驱动的生成式图像修复方法[J].自动化学报，2022，48（5）：1343-1352.
CAO C R，LIU W R，SHI C H，et al.Generative image inpainting with attention propagation[J].Acta Automatica Sinica，2022，48（5）：1343-1352.
[9] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Deeplab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully connected CRFS[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，40（4）：834-848.
[10] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[J].arXiv：1612.03144，2016.
[11] GOODFELLOW I，POUGET-ABADIE J，MIRZA M，et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems，2014.
[12] RADFORD A，METZ L，CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[J].arXiv：1511.06434，2015.
[13] IIZUKA S，SIMO-SERRA E，ISHIKAWA H.Globally and locally consistent image completion[J].ACM Transactions on Graphics（ToG），2017，36（4）：1-14.
[14] YAN Z，LI X，LI M，et al.Shift-Net：image inpainting via deep feature rearrangement[J].arXiv：1801.09392，2018.
[15] RONNEBERGER O，FISCHER P，BROX T.U-Net：convolutional networks for biomedical image segmentation[J].arXiv：1505.04597，2015.
[16] ZHANG H，HU Z，LUO C，et al.Semantic image inpainting with progressive generative networks[C]//Proceedings of the 26th ACM International Conference on Multimedia，Seoul，Jun13-19，2018.New York：ACM，2018：1939-1947.
[17] LI J，WANG N，ZHANG L，et al.Recurrent feature reasoning for image inpainting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，Seattle，June 13-19，2020.Piscataway：IEEE，2020：7760-7768.
[18] ZHANG H，GOODFELLOW I，METAXAS D，et al.Self-attention generative adversarial networks[J].arXiv：1805. 08318，2018.
[19] LIU H，JIANG B，XIAO Y，et al.Coherent semantic attention for image inpainting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，Seoul，Oct 27-Nov 2，2019.Piscataway：IEEE，2019：4170-4179.
[20] WANG N，MA S，LI J，et al.Multistage attention network for image inpainting[J].Pattern Recognition，2020，106：107448.
[21] YU J，LIN Z，YANG J，et al.Generative image inpainting with contextual attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Salt Lake City，Jun 18-23，2018.Piscataway：IEEE，2018：5505-5514.
[22] TAN M，LE Q.EfficientNet：rethinking model scaling for convolutional neural networks[J].arXiv：1905.11946，2019.
[23] ISOLA P，ZHU J Y，ZHOU T，et al.Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Honolulu，Jul 21-26，2017.Piscataway：IEEE，2017：1125-1134.
[24] HUANG S，LU Z，CHENG R，et al.FaPN：feature-aligned pyramid network for dense image prediction[C]//Proceedings of the IEEE International Conference on Computer Vision，Montreal，Oct 10-17，2021.Piscataway：IEEE，2021：844-853.
[25] DAI J，QI H，XIONG Y，et al.Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision，Venice，Oct 22-29，2017.Piscataway：IEEE，2017：764-773.
[26] SALIMANS T，GOODFELLOW I，ZAREMBA W，et al.Improved techniques for training GANs[C]//Advances in Neural Information Processing Systems，2016：2234-2242.
[27] ZHU J Y，PARK T，ISOLA P，et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision，Venice，Oct 22-29，2017.Piscataway：IEEE，2017：2223-2232.
[28] JOHNSON J，ALAHI A，FEI-FEI L.Perceptual losses for real-time style transfer and super-resolution[J].arXiv：1603.08155，2016.
[29] GATYS L A，ECKER A S，BETHGE M.Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Las Vegas，Jun 27-30，2016.Piscataway：IEEE，2016：2414-2423.
[30] SAJJADI M S M，SCHOLKOPF B，HIRSCH M.Enhancenet：single image super-resolution through automated texture synthesis[C]//Proceedings of the IEEE International Conference on Computer Vision，Venice，Oct 22-29，2017.Piscataway：IEEE，2017：4491-4500.
[31] RUSSAKOVSKY O，DENG J，SU H，et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision，2015，115（3）：211-252.
[32] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409.1556，2014.
[33] LIU Z，LUO P，WANG X，et al.Deep learning face attributes in the wild[J].arXiv：1411.7766，2014.
[34] DOERSCH C，SINGH S，GUPTA A，et al.What makes paris look like paris?[J].ACM Transactions on Graphics，2012，31（4）：2830541.
[35] ZHOU B，LAPEDRIZA A，KHOSLA A，et al.Places：a 10 million image database for scene recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，40（6）：1452-1464.
[36] LIU G，REDA F A，SHIH K J，et al.Image inpainting for irregular holes using partial convolutions[C]//Proceedings of the European Conference on Computer Vision（ECCV），Coimbatore，Jul15-17，2018.Piscataway：IEEE，2018：85-100.
[37] NAZERI K，NG E，JOSEPH T，et al.Edgeconnect：generative image inpainting with adversarial edge learning[J].arXiv：1901.00212，2019.
[38] LIU H，JIANG B，SONG Y，et al.Rethinking image inpainting via a mutual encoder-decoder with feature equalizations[J].arXiv：2007.06929，2020.