Neural Network Q Learning Algorithm Based on Residual Gradient Method

doi:10.3778/j.issn.1002-8331.1906-0175

Abstract

Abstract:

To solve the control of nonlinear system with continuous state space, a neural network Q learning algorithm based on residual gradient method is proposed. In this algorithm, the multi-layer feedforward neural network is utilized to approximate the Q-value function and the parameters of the neural network are updated by residual gradient method. Moreover, the experience replay mechanism is used to realize the mini-batch gradient update for neural network parameters, which can effectively reduce the number of iterations and increase the learning speed. To improve the stability of the training process further, the momentum optimization method is introduced. In addition, Softplus activation function is selected to replace the commonly used ReLU to avoid the problem that some neurons may never be activated and the corresponding parameters may never be updated due to the zero value of ReLU in negative areas. The simulation results of CartPole control task show the correctness and effectiveness of the proposed algorithm.

Key words: Q learning, neural network, value function approximation, residual gradient method, experience replay

摘要：

针对连续状态空间的非线性系统控制问题，提出一种基于残差梯度法的神经网络Q学习算法。该算法采用多层前馈神经网络逼近Q值函数，同时利用残差梯度法更新神经网络参数以保证收敛性。引入经验回放机制实现神经网络参数的小批量梯度更新，有效减少迭代次数，加快学习速度。为了进一步提高训练过程的稳定性，引入动量优化。此外，采用Softplus函数代替一般的ReLU激活函数，避免了ReLU函数在负数区域值恒为零所导致的某些神经元可能永远无法被激活，相应的权重参数可能永远无法被更新的问题。通过CartPole控制任务的仿真实验，验证了所提算法的正确性和有效性。

关键词: Q学习, 神经网络, 值函数近似, 残差梯度法, 经验回放

SI Yanna, PU Jiexin, ZANG Shaofei. Neural Network Q Learning Algorithm Based on Residual Gradient Method[J]. Computer Engineering and Applications, 2020, 56(18): 137-142.

司彦娜，普杰信，臧绍飞. 基于残差梯度法的神经网络Q学习算法[J]. 计算机工程与应用, 2020, 56(18): 137-142.

[1]	FENG Jun, ZHANG Tao, HANG Tingting. Survey of Overlapping Entities and Relations Extraction [J]. Computer Engineering and Applications, 2022, 58(1): 1-11.
[2]	WANG Wenxi, LI Lelin. Review of Deep Learning in Point Cloud Classification [J]. Computer Engineering and Applications, 2022, 58(1): 26-40.
[3]	ZHANG Xin, ZHU Jiang. Network Security Situation Elements Acquisition for Sample Imbalance [J]. Computer Engineering and Applications, 2022, 58(1): 134-142.
[4]	HUANG Jinjie, ZHAO Xuanwei, ZHANG Xinyao, MA Jingping, SHI Yuqi. Short Text Entity Link Based on Domain Knowledge Graph [J]. Computer Engineering and Applications, 2022, 58(1): 165-174.
[5]	ZHANG Peng, KONG Weiwei, TENG Jinbao. Facial Expression Recognition Based on Multi-scale Feature Attention Mechanism [J]. Computer Engineering and Applications, 2022, 58(1): 182-189.
[6]	YANG Youwei, ZHOU Gang. Improved NMS Algorithm for Text Detection in Natural Scenes [J]. Computer Engineering and Applications, 2022, 58(1): 204-208.
[7]	XIE Hong, WANG Lichen, YUAN Xiaofang, CHEN Haibin. Sliding Mode Convolutional Neural Network Trajectory Tracking Control for Robot Manipulators [J]. Computer Engineering and Applications, 2022, 58(1): 268-273.
[8]	HUANG Tihao, LI Junqing, ZHAO Haiyong. Copy Number Variation Detection of BP Neural Network Based on Genetic Algorithm [J]. Computer Engineering and Applications, 2022, 58(1): 274-281.
[9]	XU Hao, ZHANG Kai, TIAN Yingjie, CHONG Faguang, WANG Zichao. Review of Deep Neural Network-Based Image Caption [J]. Computer Engineering and Applications, 2021, 57(9): 9-22.
[10]	RAN Rong, XU Xinghua, QIU Shaohua, CUI Xiaopeng, OUYANG Bin. Review of Crack Detection Methods Based on Deep Convolutional Neural Networks [J]. Computer Engineering and Applications, 2021, 57(9): 23-35.
[11]	MOU Qingping, ZHANG Ying, ZHANG Dongbo, WANG Xinjie, YANG Zhiqiao. Research on Visual Tracking Algorithm and Application of Target Loss Discrimination Mechanism [J]. Computer Engineering and Applications, 2021, 57(9): 140-147.
[12]	BAO Zhiqiang, XING Yu, LYU Shaoqing, HUANG Qiongdan. Improved YOLO V2 6D Object Pose Estimation Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 148-153.
[13]	WANG Lin, CHAI Jiangyun. Research on Deep Neural Network in Multi-scene Vehicle Attribute Recognition [J]. Computer Engineering and Applications, 2021, 57(9): 162-167.
[14]	HUANG Dongyi, YANG Bing, WU Zihao, KUANG Jiayi, YAN Zeming. Spatio-Temporal Fully Connected Convolutional Neural Networks for Citywide Cellular Prediction [J]. Computer Engineering and Applications, 2021, 57(9): 168-175.
[15]	ZHAO Zhiyan, YANG Hua, HU Zhiwei, YU Haiping. Identification Model of Pests on Yuluxiang Pear Leaves Based on TACNN [J]. Computer Engineering and Applications, 2021, 57(9): 176-181.

Neural Network Q Learning Algorithm Based on Residual Gradient Method

基于残差梯度法的神经网络Q学习算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics