基于CMAC的非参数化近似策略迭代增强学习

doi:10.3778/j.issn.1002-8331.1709-0489

计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (2): 128-136.DOI: 10.3778/j.issn.1002-8331.1709-0489

基于CMAC的非参数化近似策略迭代增强学习

季挺，张华

南昌大学机器人研究所，南昌 330031

出版日期:2019-01-15 发布日期:2019-01-15

Nonparametric Approximation Policy Iteration Reinforcement Learning Based on CMAC

JI Ting, ZHANG Hua

Robotics Institute, Nanchang University, Nanchang 330031, China

Online:2019-01-15 Published:2019-01-15

摘要/Abstract

摘要： 为解决在线近似策略迭代增强学习计算复杂度高、收敛速度慢的问题，引入CMAC结构作为值函数逼近器，提出一种基于CMAC的非参数化近似策略迭代增强学习（NPAPI-CMAC）算法。算法通过构建样本采集过程确定CMAC泛化参数，利用初始划分和拓展划分确定CMAC状态划分方式，利用量化编码结构构建样本数集合定义增强学习率，实现了增强学习结构和参数的完全自动构建。此外，该算法利用delta规则和最近邻思想在学习过程中自适应调整增强学习参数，利用贪心策略对动作投票器得到的结果进行选择。一级倒立摆平衡控制的仿真实验结果验证了算法的有效性、鲁棒性和快速收敛能力。

关键词: 增强学习, 小脑关节模型控制器, 非参数化, 倒立摆

Abstract: In order to solve the problems of high computational complexity and slow convergence rate?of online approximation policy iteration reinforcement learning, this essay proposes a nonparametric approximation policy iteration reinforcement learning?based on CMAC（NPAPI-CMAC）?by introducing CMAC structure as?the value function approximator. The CMAC’s generic parameter is?determined by constructing the sampling process and its state partition mode is confirmed by using initial partition and development partition. The reinforcement learning rate is defined by building sample numbers set of tilling. Through all these ways the reinforcement learning structure and parameters are constructed?completely automatically. In addition, the algorithm uses delta rule and the nearest neighbor method to automatically adjust the parameters of the algorithm in the learning process, and uses?the greedy strategy to select an action?which is obtained from?voting machine. The simulation results on the balancing control of a single inverted pendulum show the effectiveness, robustness and rapid convergence ability of the proposed algorithm.

Key words: reinforcement learning, Cerebellar Model Articulation Controller（CMAC）, nonparametric, inverted pendulum

季挺，张华. 基于CMAC的非参数化近似策略迭代增强学习[J]. 计算机工程与应用, 2019, 55(2): 128-136.

JI Ting, ZHANG Hua. Nonparametric Approximation Policy Iteration Reinforcement Learning Based on CMAC[J]. Computer Engineering and Applications, 2019, 55(2): 128-136.

[1]	郭涛. 互联双倒立摆自适应容错模糊控制[J]. 计算机工程与应用, 2017, 53(12): 261-270.
[2]	尹逊和1，樊雪丽1，杜洋1，董春2. 二级直线倒立摆系统的实物控制[J]. 计算机工程与应用, 2016, 52(20): 242-250.
[3]	陈春晓1，陈治亚1，2，陈维亚1. 基于多智能体增强学习的公交驻站控制方法[J]. 计算机工程与应用, 2015, 51(17): 8-13.
[4]	王红旗，毛啊敏. 不确定平面二级倒立摆的鲁棒自适应控制[J]. 计算机工程与应用, 2015, 51(11): 31-34.
[5]	伍思敏，陈珺，刘飞. 基于多目标粒子群的非线性系统PID控制器设计[J]. 计算机工程与应用, 2014, 50(23): 69-72.
[6]	李春光1，2，刘国栋1. 双足机器人自然ZMP轨迹生成方法研究[J]. 计算机工程与应用, 2014, 50(19): 53-57.
[7]	孙天昊，邓俊昆，陈飞，朱庆生. 基于增强学习协商策略的研究及优化[J]. 计算机工程与应用, 2012, 48(23): 44-46.
[8]	郑明，王玲娟，蔚承建. 用多Agent系统分配具有启动成本的有限资源[J]. 计算机工程与应用, 2010, 46(4): 219-222.
[9]	陈健，张持健. 三级倒立摆的LQR方法优化参数控制[J]. 计算机工程与应用, 2009, 45(29): 245-248.
[10]	谢慕君，杨海蓉. DRNN在倒立摆摆起控制中的研究[J]. 计算机工程与应用, 2009, 45(26): 223-225.
[11]	杨兴明¹，孙锐¹，赵鹏²，张培仁². μC/OS-II在倒立摆控制系统中的应用[J]. 计算机工程与应用, 2009, 45(22): 59-61.
[12]	陈圣磊,李卫红,姚娟. 基于最小二乘的Q（λ）强化学习算法[J]. 计算机工程与应用, 2008, 44(34): 47-50.
[13]	孙天昊,朱庆生,李双庆,周明强. 一种优化的基于增强学习协商策略[J]. 计算机工程与应用, 2008, 44(30): 24-25.
[14]	修国明,张积滨,潘启树. 基于实例的POMDP问题的近似求解[J]. 计算机工程与应用, 2008, 44(29): 82-85.
[15]	刘国栋,杨宝庆. 多智能体的增强学习及其在RoboCup中的应用[J]. 计算机工程与应用, 2008, 44(23): 46-48.

基于CMAC的非参数化近似策略迭代增强学习

Nonparametric Approximation Policy Iteration Reinforcement Learning Based on CMAC

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics