尺度不变的条件数约束的模型鲁棒性增强算法

doi:10.3778/j.issn.1002-8331.2212-0114

摘要/Abstract

摘要： 深度神经网络容易受到对抗样本的攻击，这一直威胁着其在安全关键的场景中的应用。基于对抗样本是由神经网络的高度线性行为产生的这一解释，提出了一种基于尺度不变的条件数约束的模型鲁棒性增强算法。在对抗训练过程中利用权重矩阵计算其范数，并通过对数函数获得尺度不变的约束项。将尺度不变的条件数约束项纳入到对抗训练优化的外层框架中，经过反向传播迭代降低权重矩阵的条件数值，从而在良态的高维权重空间中进行神经网络的线性变换，以提高防御对抗扰动的鲁棒性。该算法适用于卷积和Transformer两种架构的视觉模型，不仅在防御PGD、AutoAttack等白盒攻击时可以显著提高鲁棒精度，在防御黑盒攻击square attack等算法时也能有效增强对抗鲁棒性。在基于Transformer架构的图像分类模型上进行对抗训练时结合所提出的约束，权重矩阵的条件数值平均下降了20.7%，防御PGD攻击时可提高1.16个百分点的鲁棒精度。与Lipschitz约束等同类方法相比，提出的算法还能提高干净样本的精度，缓解对抗训练造成的模型泛化性低的问题。

关键词: 对抗训练, 对抗鲁棒性, 条件数, 尺度不变性, 图像分类

Abstract: Deep neural networks are vulnerable to adversarial examples, which has been threatening their application in safety-critical scenarios. Based on the explanation that adversarial examples arise from the highly linear behavior of neural networks, a model robustness enhancement algorithm based on scale-invariant condition number constraint is proposed. Firstly, all weight matrices are used to calculate their norms during the adversarial training process, and the scale-invariant constraint term is obtained through the logarithmic function. Secondly, the scale-invariant condition number constraint item is incorporated into the outer framework of adversarial training optimization, and the condition number value of all weight matrices are iteratively reduced through backpropagation, thereby performing linear transformation of the neural network in a well-conditioned high-dimensional weight space, to improve robustness against adversarial perturbations. This algorithm is suitable for visual models of both convolution and Transformer architectures. It can not only significantly improve the robust accuracy against white-box attacks such as PGD and AutoAttack, but also effectively enhance the adversarial robustness of defending against black-box attack algorithms including square attack. Incorporating the proposed constraint during adversarial training on Transformer-based image classification model, the condition number value of weight matrices drops by 20.7% on average, the robust accuracy can be increased by 1.16?percentage points when defending against PGD attacks. Compared with similar methods such as Lipschitz constraints, the proposed method can also improve the accuracy of clean examples and alleviate the problem of low generalization caused by adversarial training.

Key words: adversarial training, adversarial robustness, condition number, scale-invariance, image classification

徐杨宇, 高宝元, 郭杰龙, 邵东恒, 魏宪. 尺度不变的条件数约束的模型鲁棒性增强算法[J]. 计算机工程与应用, 2024, 60(8): 140-147.

XU Yangyu, GAO Baoyuan, GUO Jielong, SHAO Dongheng, WEI Xian. Model Robustness Enhancement Algorithm with Scale Invariant Condition Number Constraint[J]. Computer Engineering and Applications, 2024, 60(8): 140-147.

参考文献

[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]// Advances in Neural Information Processing Systems, 2012.
[2] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[3] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015.
[4] SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[C]//Proceedings of the 2nd International Conference on Learning Representations, 2014.
[5] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]//Proceedings of the International Conference on Learning Representations, 2020.
[6] BHOJANAPALLI S, CHAKRABARTI A, GLASNER D, et al. Understanding robustness of transformers for image classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
[7] EYKHOLT K, EVTIMOV I, FERNANDES E, et al. Robust physical-world attacks on deep learning visual classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[8] ILYAS A, SANTURKAR S, TSIPRAS D, et al. Adversarial examples are not bugs, they are features[C]//Advances in Neural Information Processing Systems, 2019.
[9] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[J]. arXiv:1412.6572, 2014.
[10] SINHA A, SINGH M, KRISHNAMURTHY B. Neural networks in an adversarial setting and ill-conditioned weight space[C]//Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2018.
[11] AGARWAL C, NGUYEN A, SCHONFELD D. Improving robustness to adversarial examples by encouraging discriminative features[C]//Proceedings of the 2019 IEEE International Conference on Image Processing, 2019.
[12] KHRISNE D C, SUYADNYA I M A. Indonesian herbs and spices recognition using smaller VGGNet-like network[C]//Proceedings of the 2018 International Conference on Smart Green Technology in Electrical and Information Systems, 2018.
[13] HASSANI A, WALTON S, SHAH N, et al. Escaping the big data paradigm with compact transformers[J]. arXiv:2104.05704, 2021.
[14] MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[C]// Proceedings of the International Conference on Learning Representations, 2018.
[15] CROCE F, HEIN M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks[C]//Proceedings of the International Conference on Machine Learning, 2020.
[16] CROCE F, HEIN M. Minimally distorted adversarial examples with a fast adaptive boundary attack[C]//Proceedings of the International Conference on Machine Learning, 2020.
[17] ANDRIUSHCHENKO M, CROCE F, FLAMMARION N, et al. Square attack: a query-efficient black-box adversarial attack via random search[C]//Proceedings of the European Conference on Computer Vision, 2020.
[18] CHEN P Y, ZHANG H, SHARMA Y, et al. Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models[C]//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 2017.
[19] LIN J, SONG C, HE K, et al. Nesterov accelerated gradient and scale invariance for adversarial attacks[C]//Proceedings of the International Conference on Learning Representations, 2019.
[20] ZHANG H, YU Y, JIAO J, et al. Theoretically principled trade-off between robustness and accuracy[C]//Proceedings of the International Conference on Machine Learning, 2019.
[21] WANG J, CHEN Y, CHAKRABORTY R, et al. Orthogonal convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
[22] XU J, LI Y, JIANG Y, et al. Adversarial defense via local flatness regularization[C]//Proceedings of the 2020 IEEE International Conference on Image Processing, 2020.
[23] FINLAY C, OBERMAN A M. Scaleable input gradient regularization for adversarial robustness[J]. Machine Learning with Applications, 2021, 3: 100017.
[24] YU C, XUE Y, CHEN J, et al. Enhancing adversarial robustness for image classification by regularizing class level feature distribution[C]//Proceedings of the 2021 IEEE International Conference on Image Processing, 2021.
[25] BUI A, LE T, ZHAO H, et al. Improving adversarial robustness by enforcing local and global compactness[C]//Proceedings of the 16th European Conference on Computer Vision, 2020.
[26] LIU Z Q, CUI Y F, CHAN A B. Improve generalization and robustness of neural networks via weight scale shifting invariant regularizations[C]//Proceedings of the ICML 2021 Workshop on Adversarial Machine Learning, 2021.
[27] KRIZHEVSKY A, HINTON G. Learning multiple layers of features from tiny images[J]. Handbook of Systemic Autoimmune Diseases, 2009, 1(4): 1-10.