Double Deep Q Network with Prioritized State Estimation

doi:10.3778/j.issn.1002-8331.2004-0062

Abstract

Abstract:

In the exploration problem of deep reinforcement learning, it has to make decisions based on the external reward given by the environment. However, in the sparse reward environment, no information can be acquired in the early stages, and it is difficult to dynamically adjust the exploration strategy with the acquired infomation in the latter stages. In order to alleviate this problem, prioritized state estimation method is proposed, which sets prioritized value to the state when it is accessed, and stores it in the experience buffer with external reward to guide the strategic direction of exploration. Combined with DDQN（Double Deep Q Network） and priortized experience replay, a comparative experiment is conducted in MountainCar classic control problem in OpenAI Gym and FreeWay game in Atari 2600. The results show that the method has better learning perfomance and achieves higher average score in sparse reward environment.

Key words: reinfocement learning, state estimation, Deep Q Network（DQN）, Double Deep Q Network（DDQN）

摘要：

深度强化学习探索问题中，需要根据环境给予的外部奖赏以作出决策，而在稀疏奖赏环境下，训练初期将获取不到任何信息，且在训练后期难以动态地结合已获得的信息对探索策略进行调整。为缓解这个问题，提出优先状态估计方法，在对状态进行访问时给予优先值，结合外部奖赏一并存入经验池中，引导探索的策略方向。结合DDQN（Double Deep Q Network）与优先经验回放，在OpenAI Gym中的MountainCar经典控制问题与Atari 2600中的FreeWay游戏中进行对比实验，结果表明该方法在稀疏奖赏环境中具有更好的学习性能，取得了更高的平均分数。

关键词: 强化学习, 状态估计, 深度Q网络, 双深度Q网络

ZHANG Xin, ZHANG Xi. Double Deep Q Network with Prioritized State Estimation[J]. Computer Engineering and Applications, 2021, 57(8): 78-83.

张鑫，张席. 优先状态估计的双深度Q网络[J]. 计算机工程与应用, 2021, 57(8): 78-83.

[1]	XU Zhengfeng, ZENG Weili, YANG Zhao. Survey of Civil Aircraft Trajectory Prediction [J]. Computer Engineering and Applications, 2021, 57(12): 65-74.
[2]	LI Yongpan, MEN Kun, WU Junyang. Low-Rank Representation for Outliers Detection in Power State Estimation [J]. Computer Engineering and Applications, 2019, 55(16): 255-258.
[3]	ZHANG Mingguang, ZHANG Yu. Distribution network state estimation based on artificial neural network for pseudo measurement modeling [J]. Computer Engineering and Applications, 2016, 52(17): 253-256.
[4]	GUO Wenyan, JI Chunyan, ZHOU Jirui. Square-root cubature quadrature Kalman filter [J]. Computer Engineering and Applications, 2016, 52(12): 37-41.
[5]	LIU Yang, HUANG Pan. A more general class of cubature Kalman filters [J]. Computer Engineering and Applications, 2015, 51(14): 207-210.
[6]	CAO Jie, LI Wei. Improved PF algorithm and performance analysis [J]. Computer Engineering and Applications, 2012, 48(8): 144-147.
[7]	LIU Huawei, LI Xiangqing, XIE Wenjun. Ground target tracking state estimation using sort-based particle filter algorithm [J]. Computer Engineering and Applications, 2012, 48(15): 21-23.
[8]	XU Tao1，2，3，YANG Xiaoguang2，XU Aigong1，ZHANG Mingyue1. Urban road traffic state estimation based on data fusion [J]. Computer Engineering and Applications, 2011, 47(7): 218-221.
[9]	MEI Rongrong，WU Xiaojun，FENG Zhenhua. Face recognition method based on state estimation and tensorfaces algorithm [J]. Computer Engineering and Applications, 2011, 47(24): 143-145.
[10]	HUANG Sheng-xi¹，LIU Hua². Method for radar target tracking based on additive sequential unscented Kalman filter [J]. Computer Engineering and Applications, 2010, 46(8): 214-216.
[11]	CHEN Jin-guang^1，2，MA Li-li¹，CHEN Liang¹. Research on target tracking algorithm based on marginalized particle filter [J]. Computer Engineering and Applications, 2010, 46(28): 128-131.
[12]	MAO Lin^1，2，LIU Sheng¹. Multi-sensor distributed information fusion particle filter [J]. Computer Engineering and Applications, 2010, 46(12): 118-119.
[13]	LI Xiong-jie^1,2,ZHOU Dong-hua². Fault diagnosis based on strong tracking filter for hybrid systems [J]. Computer Engineering and Applications, 2009, 45(9): 233-236.
[14]	LIU Hua¹，HUANG Sheng-xi². Method for radar target tracking based on sequential unscented Kalman filter [J]. Computer Engineering and Applications, 2009, 45(25): 202-204.
[15]	YU Yi-xin,XU Chen,JIA Hong-jie. Optimal algorithms for task scheduling on smart grid fast simulation and modeling [J]. Computer Engineering and Applications, 2009, 45(19): 26-30.

Double Deep Q Network with Prioritized State Estimation

优先状态估计的双深度Q网络

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics