Research on Deep Reinforcement Learning Algorithm Based on Dynamic Fusion Target

doi:10.3778/j.issn.1002-8331.1712-0280

Abstract

Abstract: Aiming at the problem of overestimation in deep reinforcement learning algorithm, a target dynamic fusion mechanism is proposed. Based on the Deep [Q] Networks（DQN） algorithm, an improvement is proposed to reduce the overestimation in DQN algorithm by incorporating the update target of Sarsa algorithm, while retaining the DQN algorithm to speed up the learning process, dynamically combining the respective advantages of the DQN algorithm and the Sarsa algorithm, the DTDQN （Dynamic Target Deep [Q] Network） algorithm is proposed. The experiment of Carteole control problem on OpenAI Gym with open platform is carried out. The results show that DTDQN can effectively reduce the overvalue of the function, and improve the learning performance and the training stability obviously.

Key words: deep reinforcement learning, overestimation, update target, dynamic fusion

摘要： 针对深度强化学习算法中存在的过估计问题，提出了一种目标动态融合机制，在Deep [Q] Networks（DQN）算法基础上进行改进，通过融合Sarsa算法的在线更新目标，来减少DQN算法存在的过估计影响，动态地结合了DQN算法和Sarsa算法各自优点，提出了DTDQN（Dynamic Target Deep [Q] Network）算法。利用公测平台OpenAI Gym上Cart-Pole控制问题进行仿真对比实验，结果表明DTDQN算法能够有效地减少值函数过估计，具有更好的学习性能，训练稳定性有明显提升。

关键词: 深度强化学习, 过估计, 更新目标, 动态融合

XU Zhixiong, CAO Lei, ZHANG Yongliang, CHEN Xiliang, LI Chenxi. Research on Deep Reinforcement Learning Algorithm Based on Dynamic Fusion Target[J]. Computer Engineering and Applications, 2019, 55(7): 157-161.

徐志雄，曹雷，张永亮，陈希亮，李晨溪. 基于动态融合目标的深度强化学习算法研究[J]. 计算机工程与应用, 2019, 55(7): 157-161.

[1]	MA Zhihao, ZHU Xiangbin. Research on Quasi-hyperbolic Momentum Gradient for Adversarial Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(24): 90-99.
[2]	LI Baoshuai, YE Chunming. Job Shop Scheduling Problem Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(23): 248-254.
[3]	CHENG Yi, HAO Mimi. Path Planning for Indoor Mobile Robot with Improved Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(21): 256-262.
[4]	KUANG Liqun, LI Siyuan, FENG Li, HAN Xie, XU Qingyu. Application of Deep Reinforcement Learning Algorithm on Intelligent Military Decision System [J]. Computer Engineering and Applications, 2021, 57(20): 271-278.
[5]	KONG Songtao, LIU Chichi, SHI Yong, XIE Yi, WANG Kun. Review of Application Prospect of Deep Reinforcement Learning in Intelligent Manufacturing [J]. Computer Engineering and Applications, 2021, 57(2): 49-59.
[6]	ZHANG Rongxia, WU Changxu, SUN Tongchao, ZHAO Zengshun. Progress on Deep Reinforcement Learning in Path Planning [J]. Computer Engineering and Applications, 2021, 57(19): 44-56.
[7]	YANG Xueyu, CHEN Jianping, FU Qiming, LU You, WU Hongjie. Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method [J]. Computer Engineering and Applications, 2021, 57(19): 104-111.
[8]	SONG Haonan, ZHAO Gang, WANG Xingfen. Knowledge Reasoning Method Combining Knowledge Representation with Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(19): 189-197.
[9]	YANG Tong, QIN Jin. Adaptive ε-greedy Strategy Based on Average Episodic Cumulative Reward [J]. Computer Engineering and Applications, 2021, 57(11): 148-155.
[10]	SUN Yu, CAO Lei, CHEN Xiliang, XU Zhixiong, LAI Jun. Overview of Multi-Agent Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2020, 56(5): 13-24.
[11]	HAN Daoqi, ZHANG Junyao, ZHOU Yuhang, LIU Qing. Research on Intelligent Trader Model Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2020, 56(21): 145-153.
[12]	LI Yue, SHAO Zhenzhou, ZHAO Zhendong, SHI Zhiping, GUAN Yong. Design of Reward Function in Deep Reinforcement Learning for Trajectory Planning [J]. Computer Engineering and Applications, 2020, 56(2): 226-232.
[13]	LAI Jun, RAO Rui. Application of Deep Reinforcement Learning in Indoor UAV Target Search [J]. Computer Engineering and Applications, 2020, 56(17): 156-160.
[14]	HUANG Dongjin, JIANG Chenfeng, HAN Kaili. 3D Path Planning Algorithm Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2020, 56(15): 30-36.
[15]	XIA Zongtao, QIN Jin. Deep Q Net Based on Advantage Learning [J]. Computer Engineering and Applications, 2019, 55(20): 101-106.

Research on Deep Reinforcement Learning Algorithm Based on Dynamic Fusion Target

基于动态融合目标的深度强化学习算法研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics