计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (18): 137-142.DOI: 10.3778/j.issn.1002-8331.1906-0175

• 模式识别与人工智能 • 上一篇    下一篇

基于残差梯度法的神经网络Q学习算法

司彦娜,普杰信,臧绍飞   

  1. 河南科技大学 信息工程学院,河南 洛阳 471023
  • 出版日期:2020-09-15 发布日期:2020-09-10

Neural Network Q Learning Algorithm Based on Residual Gradient Method

SI Yanna, PU Jiexin, ZANG Shaofei   

  1. School of Information Engineering, Henan University of Science and Technology, Luoyang, Henan 471023, China
  • Online:2020-09-15 Published:2020-09-10

摘要:

针对连续状态空间的非线性系统控制问题,提出一种基于残差梯度法的神经网络Q学习算法。该算法采用多层前馈神经网络逼近Q值函数,同时利用残差梯度法更新神经网络参数以保证收敛性。引入经验回放机制实现神经网络参数的小批量梯度更新,有效减少迭代次数,加快学习速度。为了进一步提高训练过程的稳定性,引入动量优化。此外,采用Softplus函数代替一般的ReLU激活函数,避免了ReLU函数在负数区域值恒为零所导致的某些神经元可能永远无法被激活,相应的权重参数可能永远无法被更新的问题。通过CartPole控制任务的仿真实验,验证了所提算法的正确性和有效性。

关键词: Q学习, 神经网络, 值函数近似, 残差梯度法, 经验回放

Abstract:

To solve the control of nonlinear system with continuous state space, a neural network Q learning algorithm based on residual gradient method is proposed. In this algorithm, the multi-layer feedforward neural network is utilized to approximate the Q-value function and the parameters of the neural network are updated by residual gradient method. Moreover, the experience replay mechanism is used to realize the mini-batch gradient update for neural network parameters, which can effectively reduce the number of iterations and increase the learning speed. To improve the stability of the training process further, the momentum optimization method is introduced. In addition, Softplus activation function is selected to replace the commonly used ReLU to avoid the problem that some neurons may never be activated and the corresponding parameters may never be updated due to the zero value of ReLU in negative areas. The simulation results of CartPole control task show the correctness and effectiveness of the proposed algorithm.

Key words: Q learning, neural network, value function approximation, residual gradient method, experience replay