Deep Q Net Based on Advantage Learning

doi:10.3778/j.issn.1002-8331.1806-0394

Abstract

Abstract: In the reinforcement learning problem, the different state-action value corresponding to the different action in the same state may be too small, Q-Learning algorithm will have overestimation problem when using MAX to select an action, and Deep Q Net（DQN） which combined with Q-Learning also has overestimation problem, In order to alleviate the overestimation problem in deep Q net, a deep Q net based on advantage learning is proposed. A correction item is constructed by the method of advantage learning, and modeling this correction by target value network, summing up the evaluation function Q of the deep Q net with the correction item as a new evaluation function. When the selected action is the optimal action, the correction item is zero, and the value of evaluation function and the Q is not changed. when the selected action is not the optimal action, the value of the correction is negative, and the value of the non optimal action is reduced. Compared with the traditional deep Q net, the deep Q net based on advantage learning has achieved a higher average reward in the Playing Atari 2600 control problems, such as breakout, seaquest, phoenix, amidar and a more stable strategy has been achieved in krull and seaquest.

Key words: reinforcement learning, advantage learning, Deep Q Net（DQN）, overestimation

摘要： 强化学习问题中，同一状态下不同动作所对应的状态-动作值存在差距过小的现象，Q-Learning算法采用MAX进行动作选择时会出现过估计问题，且结合了Q-Learning的深度Q网络（Deep Q Net）同样存在过估计问题。为了缓解深度Q网络中存在的过估计问题，提出一种基于优势学习的深度Q网络，通过优势学习的方法构造一个更正项，利用目标值网络对更正项进行建模，同时与深度Q网络的评估函数进行求和作为新的评估函数。当选择的动作是最优动作时，更正项为零，不对评估函数的值进行改动，当选择的动作不是最优动作时，更正项的值为负，降低了非最优动作的评估值。和传统的深度Q网络相比，基于优势学习的深度Q网络在Playing Atari 2600的控制问题breakout、seaquest、phoenix、amidar中取得了更高的平均奖赏值，在krull、seaquest中取得了更加稳定的策略。

关键词: 强化学习, 优势学习, 深度Q网络, 过估计问题

XIA Zongtao, QIN Jin. Deep Q Net Based on Advantage Learning[J]. Computer Engineering and Applications, 2019, 55(20): 101-106.

夏宗涛，秦进. 基于优势学习的深度Q网络[J]. 计算机工程与应用, 2019, 55(20): 101-106.

[1]	WANG Xiao, TANG Lun, HE Xiaoyu, CHEN Qianbin. Multi-dimensional Resource Optimization of Service Function Chain Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(4): 68-76.
[2]	LAI Jun, WEI Jingyi, CHEN Xiliang. Overview of Hierarchical Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(3): 72-79.
[3]	MA Zhihao, ZHU Xiangbin. Research on Quasi-hyperbolic Momentum Gradient for Adversarial Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(24): 90-99.
[4]	LI Baoshuai, YE Chunming. Job Shop Scheduling Problem Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(23): 248-254.
[5]	WANG Jun, CAO Lei, CHEN Xiliang, LAI Jun, ZHANG Legui. Overview on Reinforcement Learning of Multi-agent Game [J]. Computer Engineering and Applications, 2021, 57(21): 1-13.
[6]	CHENG Yi, HAO Mimi. Path Planning for Indoor Mobile Robot with Improved Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(21): 256-262.
[7]	KUANG Liqun, LI Siyuan, FENG Li, HAN Xie, XU Qingyu. Application of Deep Reinforcement Learning Algorithm on Intelligent Military Decision System [J]. Computer Engineering and Applications, 2021, 57(20): 271-278.
[8]	KONG Songtao, LIU Chichi, SHI Yong, XIE Yi, WANG Kun. Review of Application Prospect of Deep Reinforcement Learning in Intelligent Manufacturing [J]. Computer Engineering and Applications, 2021, 57(2): 49-59.
[9]	LI Hao, NING Haoyu, KANG Yan, LIANG Wentao, HUO Wen. SMRFGAN Model for Text Emotion Transfer [J]. Computer Engineering and Applications, 2021, 57(2): 170-176.
[10]	SONG Haonan, ZHAO Gang, WANG Xingfen. Knowledge Reasoning Method Combining Knowledge Representation with Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(19): 189-197.
[11]	ZHANG Rongxia, WU Changxu, SUN Tongchao, ZHAO Zengshun. Progress on Deep Reinforcement Learning in Path Planning [J]. Computer Engineering and Applications, 2021, 57(19): 44-56.
[12]	YANG Xueyu, CHEN Jianping, FU Qiming, LU You, WU Hongjie. Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method [J]. Computer Engineering and Applications, 2021, 57(19): 104-111.
[13]	WANG Keyin, SHI Zhen, YANG Zhengcai, YANG Yahui, WANG Sishan. Path Planning for Mobile Robot Using Improved Reinforcement Learning Algorithm [J]. Computer Engineering and Applications, 2021, 57(18): 270-274.
[14]	ZHANG Jun, ZHU Qingwei, YAN Junjie, WEN Bo. UAV Indoor 3D Track Planning Based on Improved Reinforcement Learning Algorithm [J]. Computer Engineering and Applications, 2021, 57(16): 175-181.
[15]	CHE Xiangbei, KANG Wenqian, OUYANG Yuhong, YANG Kehan, LI Jian. SDN Routing Optimization Algorithm Based on Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(12): 93-98.

Deep Q Net Based on Advantage Learning

基于优势学习的深度Q网络

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics