基于孪生结构的对抗样本攻击动态防御方法

doi:10.3778/j.issn.1002-8331.2101-0262

摘要/Abstract

摘要： 神经网络模型已被广泛应用于多个研究领域，但神经网络模型本身存在易受到对抗样本攻击的缺点，如在图像分类中，只需在原始图片中添加微小的对抗扰动生成对抗样本，就可以轻易欺骗神经网络分类模型，这给许多领域的应用安全带来严重的威胁。因此，研究如何提高神经网络分类模型对对抗样本攻击的防御能力成为深度学习安全领域的研究热点。目前常用的对抗样本攻击防御方法往往只侧重于提高模型对对抗样本分类的鲁棒性，或者只侧重于检测拦截对抗样本，而对抗训练需要收集大量对抗样本，且难以防御新类型的对抗样本攻击，对于使用额外的分类器去检测对抗样本的方法，则存在着易受到二次攻击等缺点。针对这些问题，提出一种基于孪生神经网络结构的对抗样本攻击动态防御方法，利用孪生结构可比较两个输入相似性的特点，从孪生神经网络两侧的预测差异着手，检测图片在动态滤波前后是否存在不同的攻击效果，从而筛选出带有动态扰动的对抗样本。实验结果表明，在没有收集特定种类的对抗样本进行训练的情况下，该方法对多种对抗样本攻击取得了良好的通用防御效果，其中在FGSM对抗样本测试集上的防御准确率达到95.35%，在DeepFool和JSMA对抗样本测试集上的防御准确率达到93.52%和93.73%，且镜像防御模块中的动态滤波器能够有效地平滑对抗扰动、防御二次攻击，提高了方法的整体安全性。

关键词: 深度学习, 孪生神经网络, 图像分类, 对抗样本攻击

Abstract: Neural network model has been widely applied in many research fields. However, the neural network model itself also has the disadvantage of being vulnerable to attack by adversarial examples. When it is applied to image classification, it can easily deceive the neural network classification model by adding some perturbations to the original image to generate a adversarial sample, which poses a serious threat to the security of many application fields. Therefore, to improve the defense ability of neural network classification model against adversarial attacks is the research focus in the field of deep learning security. The existing defense methods only focus on improving the robustness of the model to adversarial sample classification, or only focus on detecting and intercepting adversarial samples. But adversarial retraining needs to collect a large number of adversarial samples, and it can not defend against new types of adversarial attacks. For the method of using additional classifiers to detect adversarial attacks, it is vulnerable to secondary attacks. In order to solve these problems, this paper proposes a dynamic defense method against adversarial example attacks based on siamese network structure. Based on the siamese network structure, which can compare the similarity between the two inputs, this paper starts with the prediction difference between the two sides of the network, and detects whether there are different attack effects before and after the dynamic filtering, so as to select the samples with perturbations. The experimental results show that the proposed method achieves a good general defense effect against a variety of adversarial sample attacks without collecting specific kinds of adversarial samples for training, which achieves 95.35% defense accuracy on FGSM adversarial samples testset, and 93.52% and 93.73% defense accuracy on DeepFool and JSMA adversarial samples testset. Moreover, the dynamic filter in the mirror neural network can effectively smooth the perturbations of images, defend the secondary attack, and enhance the overall security of the method.

Key words: deep learning, siamese network, image classification, adversarial example attack

熊夙, 凌捷. 基于孪生结构的对抗样本攻击动态防御方法[J]. 计算机工程与应用, 2022, 58(17): 230-238.

XIONG Su, LING Jie. Dynamic Defense Method Against Adversarial Example Attacks Based on Siamese Structure[J]. Computer Engineering and Applications, 2022, 58(17): 230-238.

参考文献

[1] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[2] HUANG G，LIU Z，VAN DER MAATEN L，et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：4700-4708.
[3] TIAN Y，PEI K，JANA S，et al.Deeptest：automated testing of deep-neural-network-driven autonomous cars[C]//Proceedings of the 40th International Conference on Software Engineering，2018：303-314.
[4] FAYJIE A R，HOSSAIN S，OUALID D，et al.Driverless car：autonomous driving using deep reinforcement learning in urban environment[C]//2018 15th International Conference on Ubiquitous Robots（UR），2018：896-901.
[5] DENG Y，BAO F，KONG Y，et al.Deep direct reinforcement learning for financial signal representation and trading[J].IEEE Transactions on Neural Networks and Learning Systems，2016，28（3）：653-664.
[6] SZEGEDY C，ZAREMBA W，SUTSKEVER I，et al.Intriguing properties of neural networks[C]//Proceedings of the 2nd International Conference on Learning Representations，2014.
[7] NGUYEN A，YOSINSKI J，CLUNE J.Deep neural networks are easily fooled：high confidence predictions for unrecognizable images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：427-436.
[8] GOODFELLOW I J，SHLENS J，SZEGEDY C.Explaining and harnessing adversarial examples[J].arXiv：1412.6572，2014.
[9] HENDRIK METZEN J，CHAITHANYA KUMAR M，BROX T，et al.Universal adversarial perturbations against semantic image segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2755-2764.
[10] PAPERNOT N，MCDANIEL P，GOODFELLOW I，et al.Practical black-box attacks against machine learning[C]//Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security，2017：506-519.
[11] PAPERNOT N，MCDANIEL P，JHA S，et al.The limitations of deep learning in adversarial settings[C]//2016 IEEE European Symposium on Security and Privacy （EuroS&P），2016：372-387.
[12] WU Y，BAMMAN D，RUSSELL S.Adversarial training for relation extraction[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing，2017：1778-1783.
[13] MOOSAVI-DEZFOOLI S M，FAWZI A，FROSSARD P.Deepfool：a simple and accurate method to fool deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway，NJ：IEEE Press，2016：2574-2582.
[14] JIA X，WEI X，CAO X，Et al.Comdefend：an efficient image compression model to defend adversarial examples[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2019：6084-6092.
[15] XU W，EVANS D，QI Y.Feature squeezing：detecting adversarial examples in deep neural networks[C]//Network and Distributed System Security Symposium，2017.
[16] LIAO F，LIANG M，DONG Y，et al.Defense against adversarial attacks using high-level representation guided denoiser[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：1778-1787.
[17] YUAN X，HE P，ZHU Q，et al.Adversarial examples：attacks and defenses for deep learning[J].IEEE Transactions on Neural Networks and Learning Systems，2019，30（9）：2805-2824.
[18] HENDRYCKS D，GIMPEL K.Early methods for detecting adversarial images[J].arXiv：1608.00530，2016.
[19] AKHTAR N，MIAN A.Threat of adversarial attacks on deep learning in computer vision：a survey[J].IEEE Access，2018，6：14410-14430.
[20] CHOPRA S，HADSELL R，LECUN Y.Learning a similarity metric discriminatively，with application to face verification[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition（CVPR’05），2005：539-546.
[21] GOODFELLOW I，LEE H，LE Q，et al.Measuring invariances in deep networks[C]//Advances in Neural Information Processing Systems，2009：646-654.
[22] LECUN Y，BOSER B，DENKER J S，et al.Backpropagation applied to handwritten zip code recognition[J].Neural Computation，1989，1（4）：541-551.
[23] CARLINI N，WAGNER D.Adversarial examples are not easily detected：bypassing ten detection methods[C]//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security，2017：3-14.