计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (2): 128-136.DOI: 10.3778/j.issn.1002-8331.1709-0489

• 模式识别与人工智能 • 上一篇    下一篇

基于CMAC的非参数化近似策略迭代增强学习

季  挺,张  华   

  1. 南昌大学 机器人研究所,南昌 330031
  • 出版日期:2019-01-15 发布日期:2019-01-15

Nonparametric Approximation Policy Iteration Reinforcement Learning Based on CMAC

JI Ting, ZHANG Hua   

  1. Robotics Institute, Nanchang University, Nanchang 330031, China
  • Online:2019-01-15 Published:2019-01-15

摘要: 为解决在线近似策略迭代增强学习计算复杂度高、收敛速度慢的问题,引入CMAC结构作为值函数逼近器,提出一种基于CMAC的非参数化近似策略迭代增强学习(NPAPI-CMAC)算法。算法通过构建样本采集过程确定CMAC泛化参数,利用初始划分和拓展划分确定CMAC状态划分方式,利用量化编码结构构建样本数集合定义增强学习率,实现了增强学习结构和参数的完全自动构建。此外,该算法利用delta规则和最近邻思想在学习过程中自适应调整增强学习参数,利用贪心策略对动作投票器得到的结果进行选择。一级倒立摆平衡控制的仿真实验结果验证了算法的有效性、鲁棒性和快速收敛能力。

关键词: 增强学习, 小脑关节模型控制器, 非参数化, 倒立摆

Abstract: In order to solve the problems of high computational complexity and slow convergence rate?of online approximation policy iteration reinforcement learning, this essay proposes a nonparametric approximation policy iteration reinforcement learning?based on CMAC(NPAPI-CMAC)?by introducing CMAC structure as?the value function approximator. The CMAC’s generic parameter is?determined by constructing the sampling process and its state partition mode is confirmed by using initial partition and development partition. The reinforcement learning rate is defined by building sample numbers set of tilling. Through all these ways the reinforcement learning structure and parameters are constructed?completely automatically. In addition, the algorithm uses delta rule and the nearest neighbor method to automatically adjust the parameters of the algorithm in the learning process, and uses?the greedy strategy to select an action?which is obtained from?voting machine. The simulation results on the balancing control of a single inverted pendulum show the effectiveness, robustness and rapid convergence ability of the proposed algorithm.

Key words: reinforcement learning, Cerebellar Model Articulation Controller(CMAC), nonparametric, inverted pendulum