计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (19): 259-267.DOI: 10.3778/j.issn.1002-8331.2306-0323

• 网络、通信与安全 • 上一篇    下一篇

遗忘学习前置的反后门学习方法研究

王晗旭,李欣,许文韬,斯彬洲   

  1. 1. 中国人民公安大学  信息网络安全学院,北京  100038
    2. 安全防范技术与风险评估公安部重点实验室,北京  100026
  • 出版日期:2024-10-01 发布日期:2024-09-30

Research on Anti-Backdoor Learning Method Based on Preposed Unlearning

WANG Hanxu, LI Xin, XU Wentao, SI Binzhou   

  1. 1. School of Information Network Security, People’s Public Security University of China, Beijing 100038, China
    2. Key Laboratory of Security Prevention Technology and Risk Assessment of the Ministry of Public Security, Beijing 100026, China
  • Online:2024-10-01 Published:2024-09-30

摘要: 反后门学习方法(anti-backdoor learning, ABL)在利用中毒数据集进行模型训练过程中能实时检测并抑制后门生成,最终得到良性模型。但反后门学习方法存在后门样本和良性样本无法有效隔离、后门消除效率不高的问题。为此,提出遗忘学习前置的反后门学习方法(anti-backdoor learning method based on preposed unlearning, ABL-PU),在隔离阶段对训练样本增加提纯操作,达到有效隔离良性样本的目标,在消除阶段采用后门遗忘-模型再训练的范式,并引入遗忘系数,实现后门的高效消除。在CIFAR-10数据集上针对后门攻击方法BadNets,遗忘学习前置的反后门学习方法较反后门学习方法(基线方法)良性准确率提高1.21个百分点,攻击成功率下降1.38个百分点。

关键词: 后门攻击, 反后门学习, 数据提纯, 遗忘学习前置, 遗忘系数

Abstract: The anti-backdoor learning (ABL) method can detect and suppress backdoor generation in real time during model training with poisoned datasets, and finally obtain a benign model. However, the ABL method suffers from the problem that the backdoor samples and benign samples cannot be effectively isolated and the efficiency of backdoor elimination is not high. To this end, an anti-backdoor learning method based on preposed unlearning (ABL-PU) is proposed, which adds a purification operation to the training samples in the isolation phase to achieve the goal of effective isolation of benign samples, and adopts a paradigm of backdoor unlearning and model retraining in the elimination phase, and introduces unlearning coefficients to achieve efficient backdoor elimination. On the CIFAR-10 dataset, against the classical backdoor attack method BadNets, the anti-backdoor learning method based on preposed unlearning improves the benign accuracy rate by 1.21 percentage points and decreases the attack success rate by 1.38 percentage points compared with the anti-backdoor learning method (the baseline method).

Key words: backdoor attacks, anti-backdoor learning, data purification, preposed unlearning, unlearning coefficient