计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (8): 78-83.DOI: 10.3778/j.issn.1002-8331.2004-0062

• 理论与研发 • 上一篇    下一篇

优先状态估计的双深度Q网络

张鑫,张席   

  1. 深圳大学 计算机与软件学院,广东 深圳 518061
  • 出版日期:2021-04-15 发布日期:2021-04-23

Double Deep Q Network with Prioritized State Estimation

ZHANG Xin, ZHANG Xi   

  1. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518061, China
  • Online:2021-04-15 Published:2021-04-23

摘要:

深度强化学习探索问题中,需要根据环境给予的外部奖赏以作出决策,而在稀疏奖赏环境下,训练初期将获取不到任何信息,且在训练后期难以动态地结合已获得的信息对探索策略进行调整。为缓解这个问题,提出优先状态估计方法,在对状态进行访问时给予优先值,结合外部奖赏一并存入经验池中,引导探索的策略方向。结合DDQN(Double Deep Q Network)与优先经验回放,在OpenAI Gym中的MountainCar经典控制问题与Atari 2600中的FreeWay游戏中进行对比实验,结果表明该方法在稀疏奖赏环境中具有更好的学习性能,取得了更高的平均分数。

关键词: 强化学习, 状态估计, 深度Q网络, 双深度Q网络

Abstract:

In the exploration problem of deep reinforcement learning, it has to make decisions based on the external reward given by the environment. However, in the sparse reward environment, no information can be acquired in the early stages, and it is difficult to dynamically adjust the exploration strategy with the acquired infomation in the latter stages. In order to alleviate this problem, prioritized state estimation method is proposed, which sets prioritized value to the state when it is accessed, and stores it in the experience buffer with external reward to guide the strategic direction of exploration. Combined with DDQN(Double Deep Q Network) and priortized experience replay, a comparative experiment is conducted in MountainCar classic control problem in OpenAI Gym and FreeWay game in Atari 2600. The results show that the method has better learning perfomance and achieves higher average score in sparse reward environment.

Key words: reinfocement learning, state estimation, Deep Q Network(DQN), Double Deep Q Network(DDQN)