计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (17): 230-238.DOI: 10.3778/j.issn.1002-8331.2101-0262

• 图形图像处理 • 上一篇    下一篇

基于孪生结构的对抗样本攻击动态防御方法

熊夙,凌捷   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2022-09-01 发布日期:2022-09-01

Dynamic Defense Method Against Adversarial Example Attacks Based on Siamese Structure

XIONG Su,LING Jie   

  1. School of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2022-09-01 Published:2022-09-01

摘要: 神经网络模型已被广泛应用于多个研究领域,但神经网络模型本身存在易受到对抗样本攻击的缺点,如在图像分类中,只需在原始图片中添加微小的对抗扰动生成对抗样本,就可以轻易欺骗神经网络分类模型,这给许多领域的应用安全带来严重的威胁。因此,研究如何提高神经网络分类模型对对抗样本攻击的防御能力成为深度学习安全领域的研究热点。目前常用的对抗样本攻击防御方法往往只侧重于提高模型对对抗样本分类的鲁棒性,或者只侧重于检测拦截对抗样本,而对抗训练需要收集大量对抗样本,且难以防御新类型的对抗样本攻击,对于使用额外的分类器去检测对抗样本的方法,则存在着易受到二次攻击等缺点。针对这些问题,提出一种基于孪生神经网络结构的对抗样本攻击动态防御方法,利用孪生结构可比较两个输入相似性的特点,从孪生神经网络两侧的预测差异着手,检测图片在动态滤波前后是否存在不同的攻击效果,从而筛选出带有动态扰动的对抗样本。实验结果表明,在没有收集特定种类的对抗样本进行训练的情况下,该方法对多种对抗样本攻击取得了良好的通用防御效果,其中在FGSM对抗样本测试集上的防御准确率达到95.35%,在DeepFool和JSMA对抗样本测试集上的防御准确率达到93.52%和93.73%,且镜像防御模块中的动态滤波器能够有效地平滑对抗扰动、防御二次攻击,提高了方法的整体安全性。

关键词: 深度学习, 孪生神经网络, 图像分类, 对抗样本攻击

Abstract: Neural network model has been widely applied in many research fields. However, the neural network model itself also has the disadvantage of being vulnerable to attack by adversarial examples. When it is applied to image classification, it can easily deceive the neural network classification model by adding some perturbations to the original image to generate a adversarial sample, which poses a serious threat to the security of many application fields. Therefore, to improve the defense ability of neural network classification model against adversarial attacks is the research focus in the field of deep learning security. The existing defense methods only focus on improving the robustness of the model to adversarial sample classification, or only focus on detecting and intercepting adversarial samples.  But adversarial retraining needs to collect a large number of adversarial samples, and it can not defend against new types of adversarial attacks.  For the method of using additional classifiers to detect adversarial attacks, it is vulnerable to secondary attacks.  In order to solve these problems, this paper proposes a dynamic defense method against adversarial example attacks based on siamese network structure.  Based on the siamese network structure, which can compare the similarity between the two inputs, this paper starts with the prediction difference between the two sides of the network, and detects whether there are different attack effects before and after the dynamic filtering, so as to select the samples with perturbations.  The experimental results show that the proposed method achieves a good general defense effect against a variety of adversarial sample attacks without collecting specific kinds of adversarial samples for training, which achieves 95.35% defense accuracy on FGSM adversarial samples testset, and 93.52% and 93.73% defense accuracy on DeepFool and JSMA adversarial samples testset.  Moreover, the dynamic filter in the mirror neural network can effectively smooth the perturbations of images, defend the secondary attack, and enhance the overall security of the method.

Key words: deep learning, siamese network, image classification, adversarial example attack