计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (7): 143-151.DOI: 10.3778/j.issn.1002-8331.2111-0408

• 模式识别与人工智能 • 上一篇    下一篇

基于双重注意力和光流估计的人脸替换方法

魏玮,张鑫,朱叶   

  1. 河北工业大学 人工智能与数据科学学院,天津 300401
  • 出版日期:2023-04-01 发布日期:2023-04-01

Face Replacement Method Based on Dual Attention and Flow Estimation

WEI Wei, ZHANG Xin, ZHU Ye   

  1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
  • Online:2023-04-01 Published:2023-04-01

摘要: 视频人脸替换中的关键问题是如何更好地重建人脸图像、融合图像和保证视频的连续性,为了提升重建图像和人脸掩模质量,解决视频播放不自然问题,提出一种基于双重注意力机制和光流估计的自动人脸替换方法。人脸重建网络以生成对抗网络为主体,为了提升网络的特征提取能力,在人脸重建网络中引入双重注意力模块,并使用深度可分离卷积替代模块中部分卷积,降低引入模块增加的网络计算量。针对人脸重建后前后帧时间域关系丢失的现象,添加一种基于光流估计的视频帧处理模块和平滑视频帧方法。实验结果表明,该方法相比FaceSwap、DeepFakes和FaceShifter替换方法能够更好地保持目标视频人脸的颜色、姿态和表情,使视频具有更好的连续性,提升人脸替换视频质量。

关键词: 视频人脸替换, 生成对抗网络, 双重注意力, 光流估计

Abstract: The key problem of automatic video face replacement is how to reconstruct face image, fuse image and ensure the continuity of video. In order to improve the quality of reconstructed image and segmentation mask and solve the problem of unnatural video playback, an automatic face replacement method based on dual attention mechanism and optical flow estimation is proposed. The face reconstruction network is mainly based on generative adversarial network. In order to improve the feature extraction capability of the network, the dual attention module is introduced into the face reconstruction network, and the depthwise separable convolutions is used to replace part of the convolution in the module to reduce the network computation increased by the introduction of the module. Aiming at the loss of the time domain relation of the front and back frames after face reconstruction , a video frame processing module based on optical flow estimation and a video frame smoothing method are added. Experimental results show that, compared with FaceSwap, DeepFakes and FaceShifter replacement methods, this method can better maintain the color, posture and expression of the target video, make the video have better continuity, and improve the quality of face replacement video.

Key words: video face replacement, generated adversarial network, dual attention, optical flow estimation