计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (34): 47-50.DOI: 10.3778/j.issn.1002-8331.2008.34.013

• 理论研究 • 上一篇    下一篇

基于最小二乘的Qλ)强化学习算法

陈圣磊,李卫红,姚 娟   

  1. 南京审计学院 信息科学学院,南京 210094
  • 收稿日期:2008-05-21 修回日期:2008-07-24 出版日期:2008-12-01 发布日期:2008-12-01
  • 通讯作者: 陈圣磊

Least-Squares based Qλ) algorithm for reinforcement learning

CHEN Sheng-lei,LI Wei-hong,YAO Juan   

  1. School of Information Science,Nanjing Audit University,Nanjing 210094,China
  • Received:2008-05-21 Revised:2008-07-24 Online:2008-12-01 Published:2008-12-01
  • Contact: CHEN Sheng-lei

摘要: 通过分析经典的Qλ)学习算法所存在的经验利用率低、收敛速度慢的问题,根据当前和多步的经验知识样本建立了状态-动作对值函数的最小二乘逼近模型,推导了该逼近函数在一组基底上的权向量所满足的一组线性方程,从而提出了快速而实用的最小二乘Qλ)算法及改进的递推算法。倒立摆实验表明,该算法可以提高经验利用率,有效加快收敛速度。

关键词: 强化学习, Qλ)学习, 函数逼近, 最小二乘, 倒立摆

Abstract: The problem of slow convergence speed and low efficiency of experience exploitation in classical Qλ) learning is analyzed.And then the Least-Squares approximation model of the state-action pair’s value function is constructed according to current and previous experience.A set of linear equations is derived,which is satisfied by the weight vector of function approximator on a set of bases.Thus the fast and practical Least-Squares Qλ) algorithm and improved recursive algorithm are proposed.The experiment of inverted pendulum demonstrates that these algorithms can effectively improve convergence speed and the efficiency of experience exploitation.

Key words: reinforcement learning, Qλ) learning, function approximation, Least-Squares, inverted pendulum