计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (19): 104-111.DOI: 10.3778/j.issn.1002-8331.2009-0097

• 理论与研发 • 上一篇    下一篇

基于随机方差减小方法的DDPG算法

杨薛钰,陈建平,傅启明,陆悠,吴宏杰   

  1. 1.苏州科技大学 电子与信息工程学院,江苏 苏州 215009
    2.苏州科技大学 江苏省建筑智慧节能重点实验室,江苏 苏州 215009
    3.苏州科技大学 苏州市移动网络技术与应用重点实验室,江苏 苏州 215009
    4.珠海米枣智能科技有限公司,广东 珠海 519000
    5.苏州科技大学 苏州市虚拟现实智能交互与应用技术重点实验室,江苏 苏州 215009
  • 出版日期:2021-10-01 发布日期:2021-09-29

Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method

YANG Xueyu, CHEN Jianping, FU Qiming,  LU You,  WU Hongjie   

  1. 1.School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
    2.Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
    3.Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
    4.Zhuhai Mizao Intelligent Technology Co., Ltd., Zhuhai, Guangdong 519000, China
    5.Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
  • Online:2021-10-01 Published:2021-09-29

摘要:

针对深度确定性策略梯度算法(DDPG)收敛速度比较慢,训练不稳定,方差过大,样本应用效率低的问题,提出了一种基于随机方差减小梯度方法的深度确定性策略梯度算法(SVR-DDPG)。该算法通过利用随机方差减小梯度技术(SVRG)提出一种新的创新优化策略,将之运用到DDPG算法之中,在DDPG算法的参数更新过程中,加入了随机方差减小梯度技术,利用该方法的更新方式,使得估计的梯度方差有一个不断减小的上界,令方差不断缩小,从而在小的随机训练子集的基础上找到更加精确的梯度方向,以此来解决了由近似梯度估计误差引发的问题,加快了算法的收敛速度。将SVR-DDPG算法以及DDPG算法应用于Pendulum和Mountain Car问题,实验结果表明,SVR-DDPG算法具有比原算法更快的收敛速度,更好的稳定性,以此证明了算法的有效性。

关键词: 深度强化学习, 深度Q学习算法(DQN), 深度确定性策略梯度算法(DDPG), 随机方差缩减梯度技术

Abstract:

Aiming at the problem that the Deep Deterministic Policy Gradient(DDPG) algorithm has slow convergence speed, training instability, large variance and poor sample efficiency. This paper proposes a deep deterministic policy gradient algorithm by utilizing Stochastic Variance Reduced Gradient(SVRG) algorithm. By utilizing stochastic variance reduced techniques, it proposes an innovative optimization strategy, applying it to DDPG algorithm. In the parameter update process of the DDPG algorithm, by using the update mode of this method, the estimated gradient variance has a decreasing upper bound, so that the variance decreases continuously, so as to find a more accurate gradient direction on the basis of a small random training subset. This strategy solves the problem caused by the approximate gradient error, speeds up the convergence speed of the algorithm. Applying SVR-DDPG algorithm and DDPG algorithm to Pendulum and Mountain Car problems, experimental results show that the SVR-DDPG algorithm has a faster convergence rate and better stability than the original algorithm, which proves the effectiveness of the algorithm.

Key words: deep reinforcement learning, Deep Q-Network(DQN), Deep Deterministic Policy Gradient(DDPG), stochastic variance reduced techniques