Nonparametric Approximation Policy Iteration Reinforcement Learning Based on CMAC

doi:10.3778/j.issn.1002-8331.1709-0489

Abstract

Abstract: In order to solve the problems of high computational complexity and slow convergence rate?of online approximation policy iteration reinforcement learning, this essay proposes a nonparametric approximation policy iteration reinforcement learning?based on CMAC（NPAPI-CMAC）?by introducing CMAC structure as?the value function approximator. The CMAC’s generic parameter is?determined by constructing the sampling process and its state partition mode is confirmed by using initial partition and development partition. The reinforcement learning rate is defined by building sample numbers set of tilling. Through all these ways the reinforcement learning structure and parameters are constructed?completely automatically. In addition, the algorithm uses delta rule and the nearest neighbor method to automatically adjust the parameters of the algorithm in the learning process, and uses?the greedy strategy to select an action?which is obtained from?voting machine. The simulation results on the balancing control of a single inverted pendulum show the effectiveness, robustness and rapid convergence ability of the proposed algorithm.

Key words: reinforcement learning, Cerebellar Model Articulation Controller（CMAC）, nonparametric, inverted pendulum

摘要： 为解决在线近似策略迭代增强学习计算复杂度高、收敛速度慢的问题，引入CMAC结构作为值函数逼近器，提出一种基于CMAC的非参数化近似策略迭代增强学习（NPAPI-CMAC）算法。算法通过构建样本采集过程确定CMAC泛化参数，利用初始划分和拓展划分确定CMAC状态划分方式，利用量化编码结构构建样本数集合定义增强学习率，实现了增强学习结构和参数的完全自动构建。此外，该算法利用delta规则和最近邻思想在学习过程中自适应调整增强学习参数，利用贪心策略对动作投票器得到的结果进行选择。一级倒立摆平衡控制的仿真实验结果验证了算法的有效性、鲁棒性和快速收敛能力。

关键词: 增强学习, 小脑关节模型控制器, 非参数化, 倒立摆

JI Ting, ZHANG Hua. Nonparametric Approximation Policy Iteration Reinforcement Learning Based on CMAC[J]. Computer Engineering and Applications, 2019, 55(2): 128-136.

季挺，张华. 基于CMAC的非参数化近似策略迭代增强学习[J]. 计算机工程与应用, 2019, 55(2): 128-136.

[1]	WANG Xiao, TANG Lun, HE Xiaoyu, CHEN Qianbin. Multi-dimensional Resource Optimization of Service Function Chain Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(4): 68-76.
[2]	LAI Jun, WEI Jingyi, CHEN Xiliang. Overview of Hierarchical Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(3): 72-79.
[3]	MA Zhihao, ZHU Xiangbin. Research on Quasi-hyperbolic Momentum Gradient for Adversarial Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(24): 90-99.
[4]	LI Baoshuai, YE Chunming. Job Shop Scheduling Problem Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(23): 248-254.
[5]	CHENG Yi, HAO Mimi. Path Planning for Indoor Mobile Robot with Improved Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(21): 256-262.
[6]	WANG Jun, CAO Lei, CHEN Xiliang, LAI Jun, ZHANG Legui. Overview on Reinforcement Learning of Multi-agent Game [J]. Computer Engineering and Applications, 2021, 57(21): 1-13.
[7]	KUANG Liqun, LI Siyuan, FENG Li, HAN Xie, XU Qingyu. Application of Deep Reinforcement Learning Algorithm on Intelligent Military Decision System [J]. Computer Engineering and Applications, 2021, 57(20): 271-278.
[8]	LI Hao, NING Haoyu, KANG Yan, LIANG Wentao, HUO Wen. SMRFGAN Model for Text Emotion Transfer [J]. Computer Engineering and Applications, 2021, 57(2): 170-176.
[9]	KONG Songtao, LIU Chichi, SHI Yong, XIE Yi, WANG Kun. Review of Application Prospect of Deep Reinforcement Learning in Intelligent Manufacturing [J]. Computer Engineering and Applications, 2021, 57(2): 49-59.
[10]	SONG Haonan, ZHAO Gang, WANG Xingfen. Knowledge Reasoning Method Combining Knowledge Representation with Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(19): 189-197.
[11]	ZHANG Rongxia, WU Changxu, SUN Tongchao, ZHAO Zengshun. Progress on Deep Reinforcement Learning in Path Planning [J]. Computer Engineering and Applications, 2021, 57(19): 44-56.
[12]	YANG Xueyu, CHEN Jianping, FU Qiming, LU You, WU Hongjie. Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method [J]. Computer Engineering and Applications, 2021, 57(19): 104-111.
[13]	WANG Keyin, SHI Zhen, YANG Zhengcai, YANG Yahui, WANG Sishan. Path Planning for Mobile Robot Using Improved Reinforcement Learning Algorithm [J]. Computer Engineering and Applications, 2021, 57(18): 270-274.
[14]	ZHANG Jun, ZHU Qingwei, YAN Junjie, WEN Bo. UAV Indoor 3D Track Planning Based on Improved Reinforcement Learning Algorithm [J]. Computer Engineering and Applications, 2021, 57(16): 175-181.
[15]	CHE Xiangbei, KANG Wenqian, OUYANG Yuhong, YANG Kehan, LI Jian. SDN Routing Optimization Algorithm Based on Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(12): 93-98.

Nonparametric Approximation Policy Iteration Reinforcement Learning Based on CMAC

基于CMAC的非参数化近似策略迭代增强学习

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics