计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (4): 42-44.DOI: 10.3778/j.issn.1002-8331.2009.04.012

• 研究、探讨 • 上一篇    下一篇

效用驱动的Markov强化学习

韩 伟   

  1. 南京财经大学 信息工程学院,南京 210046
  • 收稿日期:2008-01-10 修回日期:2008-03-31 出版日期:2009-02-01 发布日期:2009-02-01
  • 通讯作者: 韩 伟

Markov reinforcement learning driven by utility

HAN Wei   

  1. College of Information Science,Nanjing University of Finance and Economics,Nanjing 210046,China
  • Received:2008-01-10 Revised:2008-03-31 Online:2009-02-01 Published:2009-02-01
  • Contact: HAN Wei

摘要: 对智能体Q强化学习方法进行了扩展,讨论效用驱动的Markov强化学习问题。与单吸收状态相比,学习过程不再是状态驱动,而是效用驱动的。智能体的学习将不再与特定的目标状态相联系,而是最大化每步的平均期望收益,即最大化一定步数内的收益总和,因此学习结果是一个平均收益最大的最优循环。证明了多吸收状态下强化学习的收敛性,将栅格图像看作具有多个吸收状态的格子世界,测试了确定性环境下多吸收状态Q学习的有效性。

关键词: 强化学习, 智能体, Markov决策过程

Abstract: This paper puts forward an extended model of Q learning and discusses a utility-drive Markov reinforcement learning.Compared with learning algorithm with single absorbed states,the learning target is not a state but to maximize the averaged utilities of agent in each decision process.The learning result is always a circle which lets agent acquire maximal rewards.Convergence of Q-learning is proved and the simulations in image grids indicates the learning result is a circle.

Key words: reinforcement learning, intelligent agent, Markov decision process