%0 Journal Article %A YANG Tong %A QIN Jin %T Adaptive ε-greedy Strategy Based on Average Episodic Cumulative Reward %D 2021 %R 10.3778/j.issn.1002-8331.2003-0019 %J Computer Engineering and Applications %P 148-155 %V 57 %N 11 %X

The trade-off between exploration and exploitation is one of the challenges of reinforcement learning. The exploration makes the agent take new actions to improve the policy while the exploitation makes the agent use the information from the historical experiences to maximize the cumulative reward. The “ε-greedy” strategy commonly used in deep reinforcement learning deals with the trade-off between exploration and exploitation, without considering other factors that affect the decision-making of the agent, so the ε-greedy strategy is of some blindness. To solve this problem, an adaptive ε-greedy strategy based on adjustment of the exploration factor is proposed. This strategy guides the agent to conduct exploration or exploitation reasonably based on the episodic cumulative reward received by the agent each task. The larger the episodic cumulative reward, the more effective actions taken by the current agent. The adaptive strategy reduces the exploration factor to make more use of historical experiences. Conversely, a smaller episodic cumulative reward means that the current policy can be improved. The adaptive strategy increases the exploration factor to explore more possible actions. Experimental results show that the improved strategy achieves higher average rewards in the Playing Atari 2600, It’s indicated that the improved strategy can better trade off between exploration and exploitation.

%U http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2003-0019