计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (14): 275-282.DOI: 10.3778/j.issn.1002-8331.2304-0174

• 网络、通信与安全 • 上一篇    下一篇

梯度聚合增强对抗样本迁移性方法

邓诗芸,凌捷   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2024-07-15 发布日期:2024-07-15

Gradient Aggregation Boosting Adversarial Examples Transferability Method

DENG Shiyun, LING Jie   

  1. School of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2024-07-15 Published:2024-07-15

摘要: 基于深度神经网络的图像分类模型容易受到对抗样本的攻击。现有研究表明,白盒攻击已经能够实现较高的攻击成功率,但在攻击其他模型时对抗样本的可迁移性较低。为提高对抗攻击的可迁移性,提出一种梯度聚合增强对抗样本迁移性方法。将原始图像与其他类别图像以特定比例进行混合,得到混合图像。通过综合考虑不同类别图像的信息,并平衡各类别之间的梯度贡献,可以避免局部振荡的影响。在迭代过程中聚合当前点的邻域其他数据点的梯度信息以优化梯度方向,避免对单一数据点的过度依赖,从而生成具有更强迁移性的对抗样本。在ImageNet数据集上的实验结果表明,所提方法显著提高了黑盒攻击的成功率和对抗样本的可迁移性。在单模型攻击上,该方法在四种常规训练模型的平均攻击成功率为88.5%,相比Admix方法提升了2.7个百分点;在集成模型攻击上平均攻击成功率达到了92.7%。此外,该方法可以与基于转换的对抗攻击方法相融合,在三种对抗训练模型上平均攻击成功率相较Admix方法提高了10.1个百分点,增强了对抗攻击的可迁移性。

关键词: 深度神经网络, 对抗攻击, 可迁移性, 梯度聚合

Abstract: Image classification models based on deep neural networks are vulnerable to adversarial examples. Existing studies have shown that white-box attacks have been able to achieve a high attack success rate, but the transferability of adversarial examples is low when attacking other models. In order to improve the transferability of adversarial attacks, this paper proposes a gradient aggregation method to enhance the transferability of adversarial examples. Firstly, the original image is mixed with other class images in a specific ratio to obtain a mixed image. By comprehensively considering the information of different categories of images and balancing the gradient contributions between categories, the influence of local oscillations can be avoided. Secondly, in the iterative process, the gradient information of other data points in the neighborhood of the current point is aggregated to optimize the gradient direction, avoiding excessive dependence on a single data point, and thus generating adversarial examples with stronger mobility. Experimental results on the ImageNet dataset show that the proposed method significantly improves the success rate of black-box attacks and the transferability of adversarial examples. On the single-model attack, the average attack success rate of the method in this paper is 88.5% in the four conventional training models, which is 2.7?percentage points higher than the Admix method; the average attack success rate on the integrated model attack reaches 92.7%. In addition, the proposed method can be integrated with the transformation-based adversarial attack method, and the average attack success rate on the three adversarial training models is 10.1?percentage points, higher than that of the Admix method, which enhances the transferability of adversarial attacks.

Key words: deep neural network, adversarial attacks, transferability, gradient aggregation