计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (24): 90-99.DOI: 10.3778/j.issn.1002-8331.2012-0082

• 理论与研发 • 上一篇    下一篇

拟双曲动量梯度的对抗深度强化学习研究

马志豪,朱响斌   

  1. 浙江师范大学 数学与计算机科学学院,浙江 金华 321004
  • 出版日期:2021-12-15 发布日期:2021-12-13

Research on Quasi-hyperbolic Momentum Gradient for Adversarial Deep Reinforcement Learning

MA Zhihao, ZHU Xiangbin   

  1. College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, Zhejiang 321004, China
  • Online:2021-12-15 Published:2021-12-13

摘要:

在深度强化学习(Deep Reinforcement Learning,DRL)中,智能体(agent)通过观察通道来观察环境状态。该观察可能包含对抗性攻击的干扰,也即对抗样本,使智能体选择了错误动作。生成对抗样本常用方法是采用随机梯度下降方法。提出使用拟双曲动量梯度算法(QHM)来生成对抗干扰,该方法能够充分利用以前的梯度动量来修正梯度下降方向,因而比采用随机梯度下降方法(SGD)在生成对抗样本上具有更高效率。同时借助这种攻击方法在鲁棒控制框架内训练了DRL鲁棒性。实验效果表明基于QHM训练方法的DRL在进行对抗性训练后,面对攻击和环境参数变化时的鲁棒性显著提高。

关键词: 深度强化学习, 对抗性攻击, 拟双曲动量梯度, 损失函数

Abstract:

In Deep Reinforcement Learning(DRL), the agent observes the state of the environment through observation channels. The observation may include the interference of adversarial attacks, making the observation result far away from the real environment state. The engineering loss function with Quasi-Hyperbolic Momentum gradient algorithm(QHM) is used to further improve the attack, which will reduce the performance of the original DRL algorithm(for example, deep double-Q network, DDQN). Then this attack is used to improve the robustness of DRL within the robust control framework. After the adversarial training of QHM-based DRL, the robustness to the original environmental parameter changes is significantly improved. In addition, several adversarial attacks are compared. Compared with other adversarial attacks, QHM-based DRL has significantly improved attack and defense capabilities.

Key words: deep reinforcement learning, adversarial attack, quasi-hyperbolic momentum gradient, loss function