计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (18): 260-267.DOI: 10.3778/j.issn.1002-8331.2205-0051

• 网络、通信与安全 • 上一篇    下一篇

优化梯度增强黑盒对抗攻击算法

刘梦庭,凌捷   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2023-09-15 发布日期:2023-09-15

Optimized Gradient Boosting Black-Box Adversarial Attack Algorithm

LIU Mengting, LING Jie   

  1. School of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2023-09-15 Published:2023-09-15

摘要: 对抗样本能够使得深度神经网络以较高置信度输出错误的结果。对抗样本分为白盒攻击和黑盒攻击,白盒攻击目前达到了较高的成功率,而黑盒攻击由于对模型、参数的未知,导致现有黑盒攻击方法的攻击成功率还较低。为了进一步提高黑盒攻击的成功率,提出了一种优化梯度增强黑盒对抗攻击算法。使用混合图像的方式去混合其他类别的图像样本,从而得到混合了其他类别信息的混合梯度。使用上一次迭代过程中的梯度方差去调整当前图像样本的梯度,得到优化梯度。将优化梯度与Adam优化算法结合进行迭代优化生成可迁移性强的对抗样本。在ImageNet数据集上进行了实验,结果表明所提算法能有效提升对抗样本的黑盒攻击性。在单模型攻击和集成模型攻击中的平均攻击成功率分别为71.7%和88.3%,融合了三个基于转换的对抗攻击算法后平均攻击成功率则达到了96.8%。此外,对现有的5个对抗防御模型进行攻击能够实现92.7%的平均成功率,优于当前基于输入转换的攻击方法以及基于梯度的攻击方法。

关键词: 对抗样本, 深度神经网络, 黑盒攻击, 优化梯度, 可迁移性

Abstract: Adversarial examples can make deep neural networks output wrong results with higher confidence. Adversarial examples are divided into white-box attacks and black-box attacks. White-box attacks have achieved a high success rate at present, while black-box attacks have a low attack success rate due to unknown models and parameters. In order to improve the success rate of black-box attacks, this paper proposes a optimized gradient boosting black-box adversarial attack algorithm. Firstly, the method in this paper uses the mixed image method to mix the image samples of other categories and obtain the mixed gradient with the information of other categories. Secondly, the gradient variance in the last iteration process is used to adjust the gradient of the current image sample to obtain the optimized gradient. Then, the optimized gradient is combined with the Adam optimization algorithm to perform iterative optimization to generate highly transferable adversarial examples. Experiments on the ImageNet dataset show that the proposed algorithm can effectively improve the black-box attack of adversarial examples. The average attack success rate of single model attack and integrated model attack is 71.7% and 88.3% respectively. The average attack success rate has reached 96.8% after the fusion of three transform-based anti-attack algorithms. In addition, the average success rate of attacking the five existing adversarial defense models is 92.7%, which is better than the current attack method based on input transformation and gradient attack method.

Key words: adversarial examples, deep neural network, black-box attack, optimized gradient, transferability