Face Replacement Method Based on Dual Attention and Flow Estimation

doi:10.3778/j.issn.1002-8331.2111-0408

Abstract

Abstract: The key problem of automatic video face replacement is how to reconstruct face image, fuse image and ensure the continuity of video. In order to improve the quality of reconstructed image and segmentation mask and solve the problem of unnatural video playback, an automatic face replacement method based on dual attention mechanism and optical flow estimation is proposed. The face reconstruction network is mainly based on generative adversarial network. In order to improve the feature extraction capability of the network, the dual attention module is introduced into the face reconstruction network, and the depthwise separable convolutions is used to replace part of the convolution in the module to reduce the network computation increased by the introduction of the module. Aiming at the loss of the time domain relation of the front and back frames after face reconstruction , a video frame processing module based on optical flow estimation and a video frame smoothing method are added. Experimental results show that, compared with FaceSwap, DeepFakes and FaceShifter replacement methods, this method can better maintain the color, posture and expression of the target video, make the video have better continuity, and improve the quality of face replacement video.

Key words: video face replacement, generated adversarial network, dual attention, optical flow estimation

摘要： 视频人脸替换中的关键问题是如何更好地重建人脸图像、融合图像和保证视频的连续性，为了提升重建图像和人脸掩模质量，解决视频播放不自然问题，提出一种基于双重注意力机制和光流估计的自动人脸替换方法。人脸重建网络以生成对抗网络为主体，为了提升网络的特征提取能力，在人脸重建网络中引入双重注意力模块，并使用深度可分离卷积替代模块中部分卷积，降低引入模块增加的网络计算量。针对人脸重建后前后帧时间域关系丢失的现象，添加一种基于光流估计的视频帧处理模块和平滑视频帧方法。实验结果表明，该方法相比FaceSwap、DeepFakes和FaceShifter替换方法能够更好地保持目标视频人脸的颜色、姿态和表情，使视频具有更好的连续性，提升人脸替换视频质量。

关键词: 视频人脸替换, 生成对抗网络, 双重注意力, 光流估计

WEI Wei, ZHANG Xin, ZHU Ye. Face Replacement Method Based on Dual Attention and Flow Estimation[J]. Computer Engineering and Applications, 2023, 59(7): 143-151.

魏玮, 张鑫, 朱叶. 基于双重注意力和光流估计的人脸替换方法[J]. 计算机工程与应用, 2023, 59(7): 143-151.

References

[1] 林源，桂良琰，王生进，等.基于真实感三维头重建的人脸替换[J].清华大学学报（自然科学版），2012，52（5）：602-606.
LIN Y，GUI L Y，WANG S J，et al.Face swapping based on 3D photo realistic head reconstuction[J].Journal of Tsinghua University（Science and Technology），2012，52（5）：602-606.
[2] NIRKIN Y，MASI I，TRAN A T，et al.On face segmentation，face swapping，and face perception[C]//Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition（FG 2018），2018：98-105.
[3] 黄若冰，贾永红.利用卷积神经网络和小面元进行人脸图像替换[J].武汉大学学报（信息科学版），2021，46（3）：335-340.
HUANG R B，JIA Y H.Face swapping using convolutional neural network and tiny facet primitive[J].Geomatics and Information Science of Wuhan University，2021，46（3）：335-340.
[4] IRYNA K，WENZHE S，JONI D，et al.Fast face-swap using convolutional neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision（ICCV），2017：3677-3685.
[5] JO Y，PARK J.SC-FEGAN：face editing generative adversarial network with user’s sketch and color[C]//Proceedings of IEEE/CVF International Conference on Computer Vision（ICCV），2019：1745-1753.
[6] YUVAL N，YOSI K，TAL H.FSGAN：subject agnostic face swapping and reenactment[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision（ICCV），2019：7183-7192.
[7] LI L Z，BAO J M，YANG H，et al.Advancing high fidelity identity swapping for forgery detection[C]//Proceedings of the International Conference on Computer Vision and Pattern Recogintion（CVPR），2020：5073-5082.
[8] HUANG H B，LI Z H，HE R，et al.IntroVAE：introspective variational autoencoders for photographic image synthesis[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems，2018：52-63.
[9] HAN Z，IAN J G，DIMITRIS N M，et al.Self-attention generative adversarial networks[C]//Proceedings of the International Conference on Machine Learning（ICML），2019：7354-7363.
[10] MA B，WANG X R，ZHANG H，et al.CBAM-GAN：generative adversarial networks based on convolutional block attention module[C]//Proceedings of the International Conference on Artificial Intelligence and Security（ICAIS），2019：227-236.
[11] FU J，LIU J，TIAN H，et al.Dual attention network for scene segmentation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020：3146-3154.
[12] HUANG Z W，ZHANG T Y，HENG W，et al.RIFE：real-time intermediate flow estimation for video frame interpolation[J].arXiv：2011.06294，2020.
[13] MAO X，LI Q，XIE H，et al.Least squares generative adversarial networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision（ICCV），2017：2813-2821.
[14] JOHNSON J，ALAHI Al，LI F F.Perceptual losses for real-time style transfer and super-resolution[C]//Proceedings of the European Conference on Computer Vision，2016：694-711.
[15] ZHANG K P，ZHANG Z P，LI Z F，et al.MTCNN：joint face detection and alignmentusing multi-task cascaded convolutional networks[J].IEEE Signal Processing Letters，2016，23：1499-1503.
[16] RS S A，COZZOLINO D，VERDOLIVA L，et al.FaceForensics++：learning to detect manipulated facial images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision（ICCV），2019：1-11.
[17] RASSOOL R.VMAF reproducibility：validating a perceptual practical video quality metric[C]//Proceedings of the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting（BMSB），2017：1-2.