Least-Squares based Q（λ） algorithm for reinforcement learning

doi:10.3778/j.issn.1002-8331.2008.34.013

Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (34): 47-50.DOI: 10.3778/j.issn.1002-8331.2008.34.013

• 理论研究 • Previous Articles Next Articles

Least-Squares based Q（λ） algorithm for reinforcement learning

CHEN Sheng-lei,LI Wei-hong,YAO Juan

School of Information Science，Nanjing Audit University，Nanjing 210094，China

Received:2008-05-21 Revised:2008-07-24 Online:2008-12-01 Published:2008-12-01
Contact: CHEN Sheng-lei

基于最小二乘的Q（λ）强化学习算法

陈圣磊,李卫红,姚娟

南京审计学院信息科学学院，南京 210094

通讯作者: 陈圣磊

Abstract

Abstract: The problem of slow convergence speed and low efficiency of experience exploitation in classical Q（λ） learning is analyzed.And then the Least-Squares approximation model of the state-action pair’s value function is constructed according to current and previous experience.A set of linear equations is derived，which is satisfied by the weight vector of function approximator on a set of bases.Thus the fast and practical Least-Squares Q（λ） algorithm and improved recursive algorithm are proposed.The experiment of inverted pendulum demonstrates that these algorithms can effectively improve convergence speed and the efficiency of experience exploitation.

Key words: reinforcement learning, Q（λ） learning, function approximation, Least-Squares, inverted pendulum

摘要： 通过分析经典的Q（λ）学习算法所存在的经验利用率低、收敛速度慢的问题，根据当前和多步的经验知识样本建立了状态-动作对值函数的最小二乘逼近模型，推导了该逼近函数在一组基底上的权向量所满足的一组线性方程，从而提出了快速而实用的最小二乘Q（λ）算法及改进的递推算法。倒立摆实验表明，该算法可以提高经验利用率，有效加快收敛速度。

关键词: 强化学习, Q（λ）学习, 函数逼近, 最小二乘, 倒立摆

CHEN Sheng-lei,LI Wei-hong,YAO Juan. Least-Squares based Q（λ） algorithm for reinforcement learning[J]. Computer Engineering and Applications, 2008, 44(34): 47-50.

陈圣磊,李卫红,姚娟. 基于最小二乘的Q（λ）强化学习算法[J]. 计算机工程与应用, 2008, 44(34): 47-50.

[1]	WANG Xiao, TANG Lun, HE Xiaoyu, CHEN Qianbin. Multi-dimensional Resource Optimization of Service Function Chain Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(4): 68-76.
[2]	LAI Jun, WEI Jingyi, CHEN Xiliang. Overview of Hierarchical Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(3): 72-79.
[3]	MA Zhihao, ZHU Xiangbin. Research on Quasi-hyperbolic Momentum Gradient for Adversarial Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(24): 90-99.
[4]	LI Baoshuai, YE Chunming. Job Shop Scheduling Problem Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(23): 248-254.
[5]	CHENG Yi, HAO Mimi. Path Planning for Indoor Mobile Robot with Improved Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(21): 256-262.
[6]	WANG Jun, CAO Lei, CHEN Xiliang, LAI Jun, ZHANG Legui. Overview on Reinforcement Learning of Multi-agent Game [J]. Computer Engineering and Applications, 2021, 57(21): 1-13.
[7]	KUANG Liqun, LI Siyuan, FENG Li, HAN Xie, XU Qingyu. Application of Deep Reinforcement Learning Algorithm on Intelligent Military Decision System [J]. Computer Engineering and Applications, 2021, 57(20): 271-278.
[8]	KONG Songtao, LIU Chichi, SHI Yong, XIE Yi, WANG Kun. Review of Application Prospect of Deep Reinforcement Learning in Intelligent Manufacturing [J]. Computer Engineering and Applications, 2021, 57(2): 49-59.
[9]	LI Hao, NING Haoyu, KANG Yan, LIANG Wentao, HUO Wen. SMRFGAN Model for Text Emotion Transfer [J]. Computer Engineering and Applications, 2021, 57(2): 170-176.
[10]	SONG Haonan, ZHAO Gang, WANG Xingfen. Knowledge Reasoning Method Combining Knowledge Representation with Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(19): 189-197.
[11]	ZHANG Rongxia, WU Changxu, SUN Tongchao, ZHAO Zengshun. Progress on Deep Reinforcement Learning in Path Planning [J]. Computer Engineering and Applications, 2021, 57(19): 44-56.
[12]	YANG Xueyu, CHEN Jianping, FU Qiming, LU You, WU Hongjie. Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method [J]. Computer Engineering and Applications, 2021, 57(19): 104-111.
[13]	WANG Keyin, SHI Zhen, YANG Zhengcai, YANG Yahui, WANG Sishan. Path Planning for Mobile Robot Using Improved Reinforcement Learning Algorithm [J]. Computer Engineering and Applications, 2021, 57(18): 270-274.
[14]	ZHANG Jun, ZHU Qingwei, YAN Junjie, WEN Bo. UAV Indoor 3D Track Planning Based on Improved Reinforcement Learning Algorithm [J]. Computer Engineering and Applications, 2021, 57(16): 175-181.
[15]	CHE Xiangbei, KANG Wenqian, OUYANG Yuhong, YANG Kehan, LI Jian. SDN Routing Optimization Algorithm Based on Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(12): 93-98.

Least-Squares based Q（λ） algorithm for reinforcement learning

基于最小二乘的Q（λ）强化学习算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics