计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (8): 140-147.DOI: 10.3778/j.issn.1002-8331.2212-0114

• 模式识别与人工智能 • 上一篇    下一篇

尺度不变的条件数约束的模型鲁棒性增强算法

徐杨宇,高宝元,郭杰龙,邵东恒,魏宪   

  1. 1.中国科学院 福建物质结构研究所,福州 350002
    2.中国科学院大学,北京 100049
    3.福建师范大学 计算机与网络空间安全学院,福州 350117
    4.中国科学院 海西研究院 泉州装备制造研究中心,福建 泉州 362200
  • 出版日期:2024-04-15 发布日期:2024-04-15

Model Robustness Enhancement Algorithm with Scale Invariant Condition Number Constraint

XU Yangyu, GAO Baoyuan, GUO Jielong, SHAO Dongheng, WEI Xian   

  1. 1.Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou 350002, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
    3.College of Computer and Cyber Security, Fujian Normal University, Fuzhou 350117, China
    4.Quanzhou Institute of Equipment Manufacturing, Haixi Institutes, Chinese Academy of Sciences, Quanzhou, Fujian 362200, China
  • Online:2024-04-15 Published:2024-04-15

摘要: 深度神经网络容易受到对抗样本的攻击,这一直威胁着其在安全关键的场景中的应用。基于对抗样本是由神经网络的高度线性行为产生的这一解释,提出了一种基于尺度不变的条件数约束的模型鲁棒性增强算法。在对抗训练过程中利用权重矩阵计算其范数,并通过对数函数获得尺度不变的约束项。将尺度不变的条件数约束项纳入到对抗训练优化的外层框架中,经过反向传播迭代降低权重矩阵的条件数值,从而在良态的高维权重空间中进行神经网络的线性变换,以提高防御对抗扰动的鲁棒性。该算法适用于卷积和Transformer两种架构的视觉模型,不仅在防御PGD、AutoAttack等白盒攻击时可以显著提高鲁棒精度,在防御黑盒攻击square attack等算法时也能有效增强对抗鲁棒性。在基于Transformer架构的图像分类模型上进行对抗训练时结合所提出的约束,权重矩阵的条件数值平均下降了20.7%,防御PGD攻击时可提高1.16个百分点的鲁棒精度。与Lipschitz约束等同类方法相比,提出的算法还能提高干净样本的精度,缓解对抗训练造成的模型泛化性低的问题。

关键词: 对抗训练, 对抗鲁棒性, 条件数, 尺度不变性, 图像分类

Abstract: Deep neural networks are vulnerable to adversarial examples, which has been threatening their application in safety-critical scenarios. Based on the explanation that adversarial examples arise from the highly linear behavior of neural networks, a model robustness enhancement algorithm based on scale-invariant condition number constraint is proposed. Firstly, all weight matrices are used to calculate their norms during the adversarial training process, and the scale-invariant constraint term is obtained through the logarithmic function. Secondly, the scale-invariant condition number constraint item is incorporated into the outer framework of adversarial training optimization, and the condition number value of all weight matrices are iteratively reduced through backpropagation, thereby performing linear transformation of the neural network in a well-conditioned high-dimensional weight space, to improve robustness against adversarial perturbations. This algorithm is suitable for visual models of both convolution and Transformer architectures. It can not only significantly improve the robust accuracy against white-box attacks such as PGD and AutoAttack, but also effectively enhance the adversarial robustness of defending against black-box attack algorithms including square attack. Incorporating the proposed constraint during adversarial training on Transformer-based image classification model, the condition number value of weight matrices drops by 20.7% on average, the robust accuracy can be increased by 1.16?percentage points when defending against PGD attacks. Compared with similar methods such as Lipschitz constraints, the proposed method can also improve the accuracy of clean examples and alleviate the problem of low generalization caused by adversarial training.

Key words: adversarial training, adversarial robustness, condition number, scale-invariance, image classification