基于残差梯度法的神经网络Q学习算法

doi:10.3778/j.issn.1002-8331.1906-0175

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (18): 137-142.DOI: 10.3778/j.issn.1002-8331.1906-0175

基于残差梯度法的神经网络Q学习算法

司彦娜，普杰信，臧绍飞

河南科技大学信息工程学院，河南洛阳 471023

出版日期:2020-09-15 发布日期:2020-09-10

Neural Network Q Learning Algorithm Based on Residual Gradient Method

SI Yanna, PU Jiexin, ZANG Shaofei

School of Information Engineering, Henan University of Science and Technology, Luoyang, Henan 471023, China

Online:2020-09-15 Published:2020-09-10

摘要/Abstract

摘要：

针对连续状态空间的非线性系统控制问题，提出一种基于残差梯度法的神经网络Q学习算法。该算法采用多层前馈神经网络逼近Q值函数，同时利用残差梯度法更新神经网络参数以保证收敛性。引入经验回放机制实现神经网络参数的小批量梯度更新，有效减少迭代次数，加快学习速度。为了进一步提高训练过程的稳定性，引入动量优化。此外，采用Softplus函数代替一般的ReLU激活函数，避免了ReLU函数在负数区域值恒为零所导致的某些神经元可能永远无法被激活，相应的权重参数可能永远无法被更新的问题。通过CartPole控制任务的仿真实验，验证了所提算法的正确性和有效性。

关键词: Q学习, 神经网络, 值函数近似, 残差梯度法, 经验回放

Abstract:

To solve the control of nonlinear system with continuous state space, a neural network Q learning algorithm based on residual gradient method is proposed. In this algorithm, the multi-layer feedforward neural network is utilized to approximate the Q-value function and the parameters of the neural network are updated by residual gradient method. Moreover, the experience replay mechanism is used to realize the mini-batch gradient update for neural network parameters, which can effectively reduce the number of iterations and increase the learning speed. To improve the stability of the training process further, the momentum optimization method is introduced. In addition, Softplus activation function is selected to replace the commonly used ReLU to avoid the problem that some neurons may never be activated and the corresponding parameters may never be updated due to the zero value of ReLU in negative areas. The simulation results of CartPole control task show the correctness and effectiveness of the proposed algorithm.

Key words: Q learning, neural network, value function approximation, residual gradient method, experience replay

司彦娜，普杰信，臧绍飞. 基于残差梯度法的神经网络Q学习算法[J]. 计算机工程与应用, 2020, 56(18): 137-142.

SI Yanna, PU Jiexin, ZANG Shaofei. Neural Network Q Learning Algorithm Based on Residual Gradient Method[J]. Computer Engineering and Applications, 2020, 56(18): 137-142.

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	40

	来源	本网站

	次数	40
	比例	100%

摘要

216

最新录用	在线预览	正式出版

0	0	216

	来源	本网站

	次数	216
	比例	100%

[1]	冯钧, 张涛, 杭婷婷. 重叠实体关系抽取综述[J]. 计算机工程与应用, 2022, 58(1): 1-11.
[2]	王文曦, 李乐林. 深度学习在点云分类中的研究综述[J]. 计算机工程与应用, 2022, 58(1): 26-40.
[3]	张欣, 朱江. 面向样本不平衡的网络安全态势要素获取[J]. 计算机工程与应用, 2022, 58(1): 134-142.
[4]	黄金杰, 赵轩伟, 张昕尧, 马敬评, 史宇奇. 基于领域知识图谱的短文本实体链接[J]. 计算机工程与应用, 2022, 58(1): 165-174.
[5]	张鹏, 孔韦韦, 滕金保. 基于多尺度特征注意力机制的人脸表情识别[J]. 计算机工程与应用, 2022, 58(1): 182-189.
[6]	杨有为, 周刚. 面向自然场景文本检测的改进NMS算法[J]. 计算机工程与应用, 2022, 58(1): 204-208.
[7]	谢宏, 王立宸, 袁小芳, 陈海滨. 机械臂卷积神经网络滑模轨迹跟踪控制[J]. 计算机工程与应用, 2022, 58(1): 268-273.
[8]	黄体浩, 李俊青, 赵海勇. 遗传算法优化的BP神经网络拷贝数变异检测[J]. 计算机工程与应用, 2022, 58(1): 274-281.
[9]	许昊，张凯，田英杰，种法广，王子超. 深度神经网络图像描述综述[J]. 计算机工程与应用, 2021, 57(9): 9-22.
[10]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[11]	牟清萍，张莹，张东波，王新杰，杨知桥. 目标丢失判别机制的视觉跟踪算法及应用研究[J]. 计算机工程与应用, 2021, 57(9): 140-147.
[12]	包志强，邢瑜，吕少卿，黄琼丹. 改进YOLO V2的6D目标姿态估计算法[J]. 计算机工程与应用, 2021, 57(9): 148-153.
[13]	王林，柴江云. 深度神经网络在多场景车辆属性识别中的研究[J]. 计算机工程与应用, 2021, 57(9): 162-167.
[14]	赵志焱，杨华，胡志伟，宇海萍. 基于TACNN的玉露香梨叶虫害识别[J]. 计算机工程与应用, 2021, 57(9): 176-181.
[15]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.

基于残差梯度法的神经网络Q学习算法

Neural Network Q Learning Algorithm Based on Residual Gradient Method

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐 0

Metrics