计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (28): 29-31.DOI: 10.3778/j.issn.1002-8331.2008.28.009

• 博士论坛 • 上一篇    下一篇

一种基于动态参数调整的强化学习动作选择机制

胡晓辉   

  1. 兰州交通大学 电子与信息工程学院,兰州 730070
  • 收稿日期:2008-06-20 修回日期:2008-07-10 出版日期:2008-10-01 发布日期:2008-10-01
  • 通讯作者: 胡晓辉

Action choice mechanism of reinforcement learning based on adjusted dynamic parameters

HU Xiao-hui   

  1. School of Electronic & Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China
  • Received:2008-06-20 Revised:2008-07-10 Online:2008-10-01 Published:2008-10-01
  • Contact: HU Xiao-hui

摘要: 强化学习是一种重要的无监督机器学习技术,它能够利用不确定的环境下的奖赏发现最优的行为序列,实现动态环境下的在线学习,被广泛地应用到Agent系统当中。应用强化学习算法的难点之一就是如何平衡强化学习当中探索和利用之间的关系,即如何进行动作选择。结合Q学习在ε-greedy策略基础上引入计数器,从而使动作选择时的参数ε能够分阶段进行调整,从而更好地平衡探索和利用间的关系。通过对方格世界的实验仿真,证明了方法的有效性。

Abstract: Reinforcement Learning(RL) is a kind of unsupervised learning method for agent to acquire optimal behavior sequence to adapt to unknown environments with a clue of reward.Now RL is widely used in agent systems.One of difficult problems for RL is action selecting,which means how to balance the relation exploitation and exploration.A counter mechanism on the basis of Q learning combined with ε-greedy strategy is presented so that the parameters of ε-greedy can be adjusted in steps when choosing actions.The simulation results of Grid World verify the effectiveness of the method.