### Double Deep Q Network with Prioritized State Estimation

ZHANG Xin, ZHANG Xi

1. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518061, China
• Online:2021-04-15 Published:2021-04-23

### 优先状态估计的双深度Q网络

1. 深圳大学 计算机与软件学院，广东 深圳 518061

Abstract:

In the exploration problem of deep reinforcement learning, it has to make decisions based on the external reward given by the environment. However, in the sparse reward environment, no information can be acquired in the early stages, and it is difficult to dynamically adjust the exploration strategy with the acquired infomation in the latter stages. In order to alleviate this problem, prioritized state estimation method is proposed, which sets prioritized value to the state when it is accessed, and stores it in the experience buffer with external reward to guide the strategic direction of exploration. Combined with DDQN（Double Deep Q Network） and priortized experience replay, a comparative experiment is conducted in MountainCar classic control problem in OpenAI Gym and FreeWay game in Atari 2600. The results show that the method has better learning perfomance and achieves higher average score in sparse reward environment.