重复利用状态值的竞争深度Q网络算法

doi:10.3778/j.issn.1002-8331.2007-0125

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (4): 134-140.DOI: 10.3778/j.issn.1002-8331.2007-0125

重复利用状态值的竞争深度Q网络算法

张俊杰，张聪，赵涵捷

武汉轻工大学数学与计算机学院，武汉 430023

出版日期:2021-02-15 发布日期:2021-02-06

Dueling Deep Q Network Algorithm with State Value Reuse

ZHANG Junjie, ZHANG Cong, ZHAO Hanjie

School of Mathematics and Computer Science, Wuhan Polytechnic University, Wuhan 430023, China

Online:2021-02-15 Published:2021-02-06

摘要/Abstract

摘要：

在使用反距离加权法（Inverse Distance Weighted method，IDW）对土壤重金属含量进行预测时，算法中的超参数一般由先验知识确定，一定程度上存在不确定性。针对这一问题，提出了一种状态值再利用的竞争深度Q学习网络算法以精确估计IDW的超参数。该算法在训练时，将每轮训练样本中的奖励值进行标准化后，与Dueling-DQN中Q网络的状态值结合形成新的总奖励值，然后将总奖励值输入到Q网络中进行学习，从而增强了状态与动作的内在联系，使算法更加稳定。最后使用该算法在IDW上进行超参数学习，并与几种常见强化学习算法进行对比实验。实验表明，提出的RSV-DuDQN算法可以使模型更快收敛，同时提升了模型的稳定性，还可以更准确地得到IDW的参数估计。

关键词: 状态值重利用, 竞争深度Q学习网络, 反距离加权法, 超参数搜索

Abstract:

When using the Inverse Distance Weighted method（IDW） to predict the content of heavy metals in soil, the super parameters in the algorithm are generally determined by prior knowledge, and there is uncertainty to a certain extent. In order to solve this problem, a dueling deep Q-learning network algorithm for reusing state values is proposed to accurately estimate the hyper-parameters of IDW. In the training process, the reward value of each training sample is standardized and combined with the state value of Q network in Dueling-DQN to form a new total reward value, and then the total reward value is input into the Q network for learning, so as to enhance the internal relationship between state and action and make the algorithm more stable. Finally, this method is used to perform hyper-parameter search on the IDW, and compare experiments with several common deep learning algorithms. Experimental results show that the proposed RSV-DuDQN algorithm can make the model converge faster, improve the stability of the model, and get more accurate IDW parameter estimation.

Key words: reuse of state values, dueling deep Q-learning network, Inverse Distance Weighted method（IDW）, hyper-parameter search

张俊杰，张聪，赵涵捷. 重复利用状态值的竞争深度Q网络算法[J]. 计算机工程与应用, 2021, 57(4): 134-140.

ZHANG Junjie, ZHANG Cong, ZHAO Hanjie. Dueling Deep Q Network Algorithm with State Value Reuse[J]. Computer Engineering and Applications, 2021, 57(4): 134-140.

重复利用状态值的竞争深度Q网络算法

Dueling Deep Q Network Algorithm with State Value Reuse

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

编辑推荐

Metrics