计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (10): 288-298.DOI: 10.3778/j.issn.1002-8331.2404-0213

• 网络、通信与安全 • 上一篇    下一篇

基于参数高效微调及双流网络的人脸伪造检测

陈咏豪,蔡满春,张溢文,彭舒凡,姚利峰,朱懿   

  1. 中国人民公安大学 信息网络安全学院,北京 100038
  • 出版日期:2025-05-15 发布日期:2025-05-15

Face Forgery Detection Based on Parameter-Efficient Fine-Tuning and Dual-Stream Network

CHEN Yonghao, CAI Manchun, ZHANG Yiwen, PENG Shufan, YAO Lifeng, ZHU Yi   

  1. College of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China
  • Online:2025-05-15 Published:2025-05-15

摘要: 随着深度伪造技术的发展,伪造的人脸图像愈发逼真,深度伪造技术一旦被不法分子滥用将对社会治安和公共安全造成危害,因此如何对深度伪造进行有效检测成为重要的研究课题。现有的深度伪造检测技术普遍存在跨库检测泛化性差、跨压缩率检测鲁棒性差的问题。为解决上述问题,提出了一种基于参数高效微调及双流网络的人脸伪造检测。利用MIM(masked image modeling)自监督方法预训练的ViT作为分支主干,并引入低秩适应(low-rank adaptation,LoRA)进行微调,以保留预训练模型的先验知识并提高在深度伪造检测任务中的适应能力;设计了一种跨域双向适配器BCA(bi-directional cross-modal adapter)和跨域交叉注意力适配器DCA(dual-modal cross-attention adapter)用于对两条分支进行微调及信息互补。在双流网络尾部加入多层感知机适配器以完成分类。实验结果表明,在训练参数为3.75×107的情况下,该方法在六个主流数据集上的平均AUC达到了99.67%,在跨库泛化性实验中平均AUC达到了77.3%,在跨压缩率实验中平均AUC达到了89.5%。

关键词: 深度伪造检测, 双流网络, 高效参数微调, 高频噪声, 自监督预训练

Abstract: With the development of deepfake technology, the forged facial images become increasingly realistic. If deepfake technology is misused by criminals, it could pose a threat to social order and public safety. Therefore, effective detection of deepfakes has become an important research topic. Current deepfake detection technologies generally suffer from poor generalization in cross-dataset testing and weak robustness in cross-compression rate detection. To address these issues, this paper proposes a face forgery detection method based on parameter-efficient fine-tuning and dual-stream network. Specifically, this paper employs a ViT backbone pretrained via a MIM (masked image modeling) self-supervised method, and incorporates low-rank adaptation (LoRA) for fine-tuning to retain the prior knowledge of the pretrained model and enhance adaptability for deepfake detection tasks. Moreover, this paper devises a bi-directional cross-modal adapter (BCA) and a dual-modal cross-attention adapter (DCA) for fine-tuning the dual branches and facilitating information complementarity. A multi-layer perceptron adapter is integrated at the end of the dual-stream network for classification purposes. Experimental results demonstrate that the proposed method achieves an average AUC of 99.67% across six mainstream datasets with only 3.75×107 training parameters, an average AUC of 77.3% in cross-dataset generalization experiments, and an average AUC of 89.5% in cross-compression rate experiments.

Key words: deepfake detection, two-stream network, efficient parameter fine-tuning, high-frequency noise, self-supervised pretraining