基于参数高效微调及双流网络的人脸伪造检测

doi:10.3778/j.issn.1002-8331.2404-0213

摘要/Abstract

摘要： 随着深度伪造技术的发展，伪造的人脸图像愈发逼真，深度伪造技术一旦被不法分子滥用将对社会治安和公共安全造成危害，因此如何对深度伪造进行有效检测成为重要的研究课题。现有的深度伪造检测技术普遍存在跨库检测泛化性差、跨压缩率检测鲁棒性差的问题。为解决上述问题，提出了一种基于参数高效微调及双流网络的人脸伪造检测。利用MIM（masked image modeling）自监督方法预训练的ViT作为分支主干，并引入低秩适应（low-rank adaptation，LoRA）进行微调，以保留预训练模型的先验知识并提高在深度伪造检测任务中的适应能力；设计了一种跨域双向适配器BCA（bi-directional cross-modal adapter）和跨域交叉注意力适配器DCA（dual-modal cross-attention adapter）用于对两条分支进行微调及信息互补。在双流网络尾部加入多层感知机适配器以完成分类。实验结果表明，在训练参数为3.75×107的情况下，该方法在六个主流数据集上的平均AUC达到了99.67%，在跨库泛化性实验中平均AUC达到了77.3%，在跨压缩率实验中平均AUC达到了89.5%。

关键词: 深度伪造检测, 双流网络, 高效参数微调, 高频噪声, 自监督预训练

Abstract: With the development of deepfake technology, the forged facial images become increasingly realistic. If deepfake technology is misused by criminals, it could pose a threat to social order and public safety. Therefore, effective detection of deepfakes has become an important research topic. Current deepfake detection technologies generally suffer from poor generalization in cross-dataset testing and weak robustness in cross-compression rate detection. To address these issues, this paper proposes a face forgery detection method based on parameter-efficient fine-tuning and dual-stream network. Specifically, this paper employs a ViT backbone pretrained via a MIM (masked image modeling) self-supervised method, and incorporates low-rank adaptation (LoRA) for fine-tuning to retain the prior knowledge of the pretrained model and enhance adaptability for deepfake detection tasks. Moreover, this paper devises a bi-directional cross-modal adapter (BCA) and a dual-modal cross-attention adapter (DCA) for fine-tuning the dual branches and facilitating information complementarity. A multi-layer perceptron adapter is integrated at the end of the dual-stream network for classification purposes. Experimental results demonstrate that the proposed method achieves an average AUC of 99.67% across six mainstream datasets with only 3.75×107 training parameters, an average AUC of 77.3% in cross-dataset generalization experiments, and an average AUC of 89.5% in cross-compression rate experiments.

Key words: deepfake detection, two-stream network, efficient parameter fine-tuning, high-frequency noise, self-supervised pretraining

陈咏豪, 蔡满春, 张溢文, 彭舒凡, 姚利峰, 朱懿. 基于参数高效微调及双流网络的人脸伪造检测[J]. 计算机工程与应用, 2025, 61(10): 288-298.

CHEN Yonghao, CAI Manchun, ZHANG Yiwen, PENG Shufan, YAO Lifeng, ZHU Yi. Face Forgery Detection Based on Parameter-Efficient Fine-Tuning and Dual-Stream Network[J]. Computer Engineering and Applications, 2025, 61(10): 288-298.

参考文献

[1] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems 27, 2014: 2672-2680.
[2] HO J, JAIN A, ABBEEL P, et al. Denoising diffusion probabilistic models[C]//Advances in Neural Information Processing Systems 33, 2020: 6840-6851.
[3] NGUYEN H H, YAMAGISHI J, ECHIZEN I. Use of a capsule network to detect fake images and videos[J]. arXiv:1910.12467, 2019.
[4] RÖSSLER A, COZZOLINO D, VERDOLIVA L, et al. FaceForensics: learning to detect manipulated facial images[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1-11.
[5] ZHU X Y, WANG H, FEI H Y, et al. Face forgery detection by 3D decomposition[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 2928-2938.
[6] DURALL R, KEUPER M, PFREUNDT F J, et al. Unmasking DeepFakes with simple features[J]. arXiv:1911.00686, 2019.
[7] QIAN Y Y, YIN G J, SHENG L, et al. Thinking in frequency: face forgery detection by mining frequency-aware clues[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 86-103.
[8] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018.
[9] BAO H B, DONG L, PIAO S H, et al. BEiT: BERT pre-training of image transformers[J]. arXiv:2106.08254, 2021.
[10] WANG J K, WU Z X, OUYANG W H, et al. M2TR: multi-modal multi-scale transformers for deepfake detection[C]//Proceedings of the 2022 International Conference on Multimedia Retrieval, 2022: 615-623.
[11] LUO Y C, ZHANG Y, YAN J C, et al. Generalizing face forgery detection with high-frequency features[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 16312-16321.
[12] FRIDRICH J, KODOVSKY J. Rich models for steganalysis of digital images[J]. IEEE Transactions on Information Forensics and Security, 2012, 7(3): 868-882.
[13] FANG Y X, SUN Q, WANG X G, et al. EVA-02: a visual representation for neon genesis[J]. arXiv:2303.11331, 2023.
[14] HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[J]. arXiv:2106.09685, 2021.
[15] LI L Z, BAO J M, ZHANG T, et al. Face X-ray for more general face forgery detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5000-5009.
[16] ZHANG B G, LI S, FENG G R, et al. Patch diffusion: a general module for face manipulation detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(3): 3243-3251.
[17] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[18] ZHAO H Q, WEI T Y, ZHOU W B, et al. Multi-attentional deepfake detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 2185-2194.
[19] WODAJO D, ATNAFU S. Deepfake video detection using convolutional vision transformer[J]. arXiv:2102.11126, 2021.
[20] LESTER B, AL-RFOU R, CONSTANT N, et al. The power of scale for parameter-efficient prompt tuning[J]. arXiv:2104.08691, 2021.
[21] HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 2790-2799.
[22] JIA M, TANG L, CHEN B C, et al. Visual prompt tuning[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 709-727.
[23] CHEN Z, DUAN Y C, WANG W H, et al. Vision transformer adapter for dense predictions[J]. arXiv:2205.08534, 2022.
[24] SHAO R, WU T X, NIE L Q, et al. DeepFake-adapter: dual-level adapter for DeepFake detection[J]. arXiv:2306.00863, 2023.
[25] AGHAJANYAN A, ZETTLEMOYER L, GUPTA S. Intrinsic dimensionality explains the effectiveness of language model fine-tuning[J]. arXiv:2012.13255, 2020.
[26] RAMACHANDRAN P, ZOPH B, LE Q V. Searching for activation functions[J]. arXiv:1710.05941, 2017.
[27] CAO B, GUO J L, ZHU P F, et al. Bi-directional adapter for multimodal tracking[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(2): 927-935.
[28] LI Y Z, YANG X, SUN P, et al. Celeb-DF: a large-scale challenging dataset for DeepFake forensics[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3204-3213.
[29] DOLHANSKY B, BITTON J, PFLAUM B, et al. The DeepFake detection challenge (DFDC) dataset[J]. arXiv:2006. 07397, 2020.
[30] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1800-1807.
[31] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[J]. arXiv:1512.03385, 2015.
[32] COCCOMINI D A, MESSINA N, GENNARO C, et al. Combining EfficientNet and vision transformers for video deepfake detection[C]//Proceedings of the 21st International Conference on Image Analysis and Processing. Cham: Springer, 2022: 219-229.
[33] CHEN L, ZHANG Y, SONG Y B, et al. Self-supervised learning of adversarial example: towards good generalizations for deepfake detection[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 18689-18698.
[34] TAN M, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 6105-6114.
[35] WAN D, CAI M C, PENG S F, et al. Deepfake detection algorithm based on dual-branch data augmentation and modified attention mechanism[J]. Applied Sciences, 2023, 13(14): 8313.