Gradient Aggregation Boosting Adversarial Examples Transferability Method

doi:10.3778/j.issn.1002-8331.2304-0174

Abstract

Abstract: Image classification models based on deep neural networks are vulnerable to adversarial examples. Existing studies have shown that white-box attacks have been able to achieve a high attack success rate, but the transferability of adversarial examples is low when attacking other models. In order to improve the transferability of adversarial attacks, this paper proposes a gradient aggregation method to enhance the transferability of adversarial examples. Firstly, the original image is mixed with other class images in a specific ratio to obtain a mixed image. By comprehensively considering the information of different categories of images and balancing the gradient contributions between categories, the influence of local oscillations can be avoided. Secondly, in the iterative process, the gradient information of other data points in the neighborhood of the current point is aggregated to optimize the gradient direction, avoiding excessive dependence on a single data point, and thus generating adversarial examples with stronger mobility. Experimental results on the ImageNet dataset show that the proposed method significantly improves the success rate of black-box attacks and the transferability of adversarial examples. On the single-model attack, the average attack success rate of the method in this paper is 88.5% in the four conventional training models, which is 2.7?percentage points higher than the Admix method; the average attack success rate on the integrated model attack reaches 92.7%. In addition, the proposed method can be integrated with the transformation-based adversarial attack method, and the average attack success rate on the three adversarial training models is 10.1?percentage points, higher than that of the Admix method, which enhances the transferability of adversarial attacks.

Key words: deep neural network, adversarial attacks, transferability, gradient aggregation

摘要： 基于深度神经网络的图像分类模型容易受到对抗样本的攻击。现有研究表明，白盒攻击已经能够实现较高的攻击成功率，但在攻击其他模型时对抗样本的可迁移性较低。为提高对抗攻击的可迁移性，提出一种梯度聚合增强对抗样本迁移性方法。将原始图像与其他类别图像以特定比例进行混合，得到混合图像。通过综合考虑不同类别图像的信息，并平衡各类别之间的梯度贡献，可以避免局部振荡的影响。在迭代过程中聚合当前点的邻域其他数据点的梯度信息以优化梯度方向，避免对单一数据点的过度依赖，从而生成具有更强迁移性的对抗样本。在ImageNet数据集上的实验结果表明，所提方法显著提高了黑盒攻击的成功率和对抗样本的可迁移性。在单模型攻击上，该方法在四种常规训练模型的平均攻击成功率为88.5%，相比Admix方法提升了2.7个百分点；在集成模型攻击上平均攻击成功率达到了92.7%。此外，该方法可以与基于转换的对抗攻击方法相融合，在三种对抗训练模型上平均攻击成功率相较Admix方法提高了10.1个百分点，增强了对抗攻击的可迁移性。

关键词: 深度神经网络, 对抗攻击, 可迁移性, 梯度聚合

DENG Shiyun, LING Jie. Gradient Aggregation Boosting Adversarial Examples Transferability Method[J]. Computer Engineering and Applications, 2024, 60(14): 275-282.

邓诗芸, 凌捷. 梯度聚合增强对抗样本迁移性方法[J]. 计算机工程与应用, 2024, 60(14): 275-282.

References

[1] LI H, HUANG H, CHEN L, et al. Adversarial examples for CNN-based SAR image classification: an experience study[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 14: 1333-1347.
[2] ROY A M, BOSE R, BHADURI J. A fast accurate fine-grain object detection model based on YOLOv4 deep neural network[J]. Neural Computing and Applications, 2022, 34: 3895-3921.
[3] LI L, MU X, LI S, et al. A review of face recognition technology[J]. IEEE Access, 2020, 8: 139110-139120.
[4] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[J]. arXiv:1412.6572, 2014.
[5] PAPERNOT N, MCDANIEL P, JHA S, et al. The limitations of deep learning in adversarial settings[C]//Proceedings of the IEEE European Symposium on Security and Privacy, 2016: 372-387.
[6] ZHENG H, ZHANG Z, GU J, et al. Efficient adversarial training with transferable adversarial examples[C]//Proceedings of Conference on Computer Vision and Pattern Recognition, 2020: 1181-1190.
[7] WANG X, HE X, WANG J, et al. Admix: enhancing the transferability of adversarial attacks[C]//Proceedings of the International Conference on Computer Vision, 2021: 16158-16167.
[8] DEMONTIS A, MELIS M, PINTOR M, et al. Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks[C]//Proceedings of the 28th USENIX Security Symposium, 2019: 321-338.
[9] WANG Z, GUO H, ZHANG Z, et al. Feature importance-aware transferable adversarial attacks[C]//Proceedings of the International Conference on Computer Vision, 2021: 7639-7648.
[10] YANG K Y, YAU J H, LI F F, et al. A study of face obfuscation in imagenet[C]//Proceedings of the 39th International Conference on Machine Learning, 2022: 25313-25330.
[11] KURAKIN A, GOODFELLOW I, BENGIO S. Adversarial examples in the physical world[J]. arXiv:1607.02533, 2016.
[12] DONG Y, LIAO F, PANG T, et al. Boosting adversarial attacks with momentum[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition, 2018: 9185-9193.
[13] LIN J, SONG C, HE K, et al. Nesterov accelerated gradient and scale invariance for adversarial attacks[J]. arXiv:1908. 06281, 2019.
[14] XIE C, ZHANG Z, ZHOU Y, et al. Improving transferability of adversarial examples with input diversity[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition, 2019: 2730-2739.
[15] DONG Y, PANG T, SU H, et al. Evading defenses to transferable adversarial examples by translation-invariant attacks[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition, 2019: 4312-4321.
[16] XIE C, WANG J, ZHANG Z, et al. Mitigating adversarial effects through randomization[J]. arXiv:1711.01991, 2017.
[17] NASEER M, KHAN S, HAYAT M, et al. A self-supervised approach for adversarial robustness[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition, 2020: 262-271.
[18] XU W, EVANS D, QI Y. Feature squeezing: detecting adversarial examples in deep neural networks[J]. arXiv:1704. 01155, 2017.
[19] COHEN J, ROSENFELD E, KOLTER Z. Certified adversarial robustness via randomized smoothing[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 1310-1320.
[20] LI B, CHEN C, WANG W, et al. Certified adversarial robustness with additive noise[J]. arXiv:1809.03113, 2018.