计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (9): 309-316.DOI: 10.3778/j.issn.1002-8331.2212-0377

• 网络、通信与安全 • 上一篇    下一篇

基于改进积分梯度的黑盒迁移攻击算法

王正来,关胜晓   

  1. 中国科学技术大学 信息科学技术学院,合肥 230026
  • 出版日期:2024-05-01 发布日期:2024-04-29

Black-Box Transferable Attack Method Based on Improved Integrated Gradients

WANG Zhenglai, GUAN Shengxiao   

  1. School of Information Technology, University of Science and Technology of China, Hefei 230026, China
  • Online:2024-05-01 Published:2024-04-29

摘要: 对抗样本可以轻易攻击深度神经网络,影响模型的使用安全。虽然白盒攻击方法有优秀的成功率,但是在黑盒迁移攻击时通常表现出较差的效果。目前最先进的TAIG算法基于积分梯度,可以提升对抗样本的迁移性。但研究发现其积分路径缺失到饱和区的有效信息,并且使用不恰当的基线和梯度计算,限制了算法攻击成功率的上限。改进了积分梯度并提出了IIGA攻击算法(improved integrated gradients attack)。改进积分梯度将有限路径扩展为无限路径,融合输入到真实饱和区的梯度累计,可以表征每个分量更准确的重要性;并提出信息熵基线,确保基线相对于模型不含任何信息。IIGA将生成的改进积分梯度进行平滑处理作为攻击的反向优化方向,平滑操作过滤因神经网络在小范围偏导剧烈跳动而产生的大量噪点,使梯度信息集中于视觉特征,并在迭代过程中加入动量信息稳定梯度方向。在ImageNet数据集上进行的大量实验表明IIGA不仅在白盒攻击下优于FGSM、C&W等算法,在黑盒迁移攻击模式下也大大超过了SI-NI、VMI、AOA和TAIG等先进的算法。

关键词: 对抗攻击, 扩展路径, 信息熵基线, 迁移攻击

Abstract: Adversarial samples can easily attack deep neural networks and hinder the safety of the model applications. While white-box attack methods have excellent success rates, they usually perform poorly when attacking black-box models. Currently, the best algorithm TAIG is based on integrated gradients, which can improve transferability of adversarial samples. However, its integrated path is missing valid information to the saturation region and uses improper baseline and gradient calculations, which limits the success rate. This paper improves the integrated gradients and proposes the IIGA attack algorithm (improved integrated gradients attack). The improved integrated gradients extend the finite path to infinity by adding the part from the input to the true saturation region to show the importance of each component more accurately, and the information entropy baseline is proposed to ensure the baseline is free of any information relative to the model. IIGA uses the smoothed improved integrated gradient as attack reverse optimization direction. The smoothing operation filters the large amount of noise generated by the sharp perturbation of the neural network in a small bias, concentrates the gradient information on the visual features, and adds momentum information to stabilize the direction during iteration. Extensive experiments on ImageNet dataset show that IIGA not only outperforms algorithms such as FGSM and C&W in the white-box attack mode, but also exceeds the advanced transferable algorithm like SI-NI, VMI, AOA, TAIG and other in the black-box mode.

Key words: adversarial attack, extended path, information entropy baseline, transferable attack