计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (7): 157-161.DOI: 10.3778/j.issn.1002-8331.1712-0280

• 模式识别与人工智能 • 上一篇    下一篇

基于动态融合目标的深度强化学习算法研究

徐志雄,曹  雷,张永亮,陈希亮,李晨溪   

  1. 解放军陆军工程大学 指挥信息系统学院,南京 210000
  • 出版日期:2019-04-01 发布日期:2019-04-15

Research on Deep Reinforcement Learning Algorithm Based on Dynamic Fusion Target

XU Zhixiong, CAO Lei, ZHANG Yongliang, CHEN Xiliang, LI Chenxi   

  1. Institute of Command Information System, Army Engineering University, Nanjing 210000, China
  • Online:2019-04-01 Published:2019-04-15

摘要: 针对深度强化学习算法中存在的过估计问题,提出了一种目标动态融合机制,在Deep [Q] Networks(DQN)算法基础上进行改进,通过融合Sarsa算法的在线更新目标,来减少DQN算法存在的过估计影响,动态地结合了DQN算法和Sarsa算法各自优点,提出了DTDQN(Dynamic Target Deep [Q] Network)算法。利用公测平台OpenAI Gym上Cart-Pole控制问题进行仿真对比实验,结果表明DTDQN算法能够有效地减少值函数过估计,具有更好的学习性能,训练稳定性有明显提升。

关键词: 深度强化学习, 过估计, 更新目标, 动态融合

Abstract: Aiming at the problem of overestimation in deep reinforcement learning algorithm, a target dynamic fusion mechanism is proposed. Based on the Deep [Q] Networks(DQN) algorithm, an improvement is proposed to reduce the overestimation in DQN algorithm by incorporating the update target of Sarsa algorithm, while retaining the DQN algorithm to speed up the learning process, dynamically combining the respective advantages of the DQN algorithm and the Sarsa algorithm, the DTDQN (Dynamic Target Deep [Q] Network) algorithm is proposed. The experiment of Carteole control problem on OpenAI Gym with open platform is carried out. The results show that DTDQN can effectively reduce the overvalue of the function, and improve the learning performance and the training stability obviously.

Key words: deep reinforcement learning, overestimation, update target, dynamic fusion