倒立摆系统中强化学习的极限环问题

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (10): 16-19.

倒立摆系统中强化学习的极限环问题

郑宇,罗四维,吕子昂

北京交通大学计算机与信息技术学院，北京 100044

收稿日期:2007-11-21 修回日期:2007-12-25 出版日期:2008-04-01 发布日期:2008-04-01
通讯作者: 郑宇

Limit cycles in inverted pendulum system by reinforcement learning

ZHENG Yu,LUO Si-wei,LV Zi-ang

School of Computer and Information Technology，Beijing Jiaotong University，Beijing 100044，China

Received:2007-11-21 Revised:2007-12-25 Online:2008-04-01 Published:2008-04-01
Contact: ZHENG Yu

摘要/Abstract

摘要： 倒立摆系统是强化学习的一种重要的应用领域。首先分析指出在倒立摆系统中，常用的强化学习算法存在着极限环问题，算法无法正确收敛、控制策略不稳定。但是由于在简单的一级倒立摆系统中算法的控制策略不稳定的现象还不明显，因此极限环问题常常被忽视。针对强化学习算法中极限环问题，提出基于动作连续性准则的强化学习算法。算法采用修正强化信号和改进探索策略的方法克服极限环对倒立摆系统的影响。将提出的算法用于二级倒立摆的实际系统控制中，实验结果证明算法不仅能成功控制倒立摆，而且可以保持控制策略的稳定。

关键词: 极限环, 强化学习, 倒立摆

Abstract: An important application of reinforcement learning in control systems is inverted pendulum.This paper points out that the common reinforcement learning algorithm will get into the limit cycles in the inverted pendulum system，which makes the algorithm incorrectly converge and destroy the stabilization of the optimal control policy.But the limit cycles problem is often ignored in many literatures as the goal of their algorithms is only to keep the pendulum stand in a given time.To overcome the limit cycles problem，this paper proposes a new reinforcement learning algorithm based on action continuity criterion.The algorithm revises the reinforcement signal and improves the exploration policy to overcome the negative effect of limit cycles in the inverted pendulum system.Simulation and actual control results of the double inverted pendulum system show the algorithm can not only control inverted pendulum successfully，but also keep the control policy stable.

Key words: limit cycles, reinforcement learning, inverted pendulum

郑宇,罗四维,吕子昂. 倒立摆系统中强化学习的极限环问题[J]. 计算机工程与应用, 2008, 44(10): 16-19.

ZHENG Yu,LUO Si-wei,LV Zi-ang. Limit cycles in inverted pendulum system by reinforcement learning[J]. Computer Engineering and Applications, 2008, 44(10): 16-19.

[1]	张鑫，张席. 优先状态估计的双深度Q网络[J]. 计算机工程与应用, 2021, 57(8): 78-83.
[2]	王晓，唐伦，贺小雨，陈前斌. 基于深度强化学习的服务功能链多维资源优化[J]. 计算机工程与应用, 2021, 57(4): 68-76.
[3]	赖俊，魏竞毅，陈希亮. 分层强化学习综述[J]. 计算机工程与应用, 2021, 57(3): 72-79.
[4]	马志豪，朱响斌. 拟双曲动量梯度的对抗深度强化学习研究[J]. 计算机工程与应用, 2021, 57(24): 90-99.
[5]	李宝帅，叶春明. 深度强化学习算法求解作业车间调度问题[J]. 计算机工程与应用, 2021, 57(23): 248-254.
[6]	王军，曹雷，陈希亮，赖俊，章乐贵. 多智能体博弈强化学习研究综述[J]. 计算机工程与应用, 2021, 57(21): 1-13.
[7]	成怡，郝密密. 改进深度强化学习的室内移动机器人路径规划[J]. 计算机工程与应用, 2021, 57(21): 256-262.
[8]	况立群，李思远，冯利，韩燮，徐清宇. 深度强化学习算法在智能军事决策中的应用[J]. 计算机工程与应用, 2021, 57(20): 271-278.
[9]	孔松涛，刘池池，史勇，谢义，王堃. 深度强化学习在智能制造中的应用展望综述[J]. 计算机工程与应用, 2021, 57(2): 49-59.
[10]	李浩，宁浩宇，康雁，梁文韬，霍雯. 针对文本情感转换的SMRFGAN模型[J]. 计算机工程与应用, 2021, 57(2): 170-176.
[11]	张荣霞，武长旭，孙同超，赵增顺. 深度强化学习及在路径规划中的研究进展[J]. 计算机工程与应用, 2021, 57(19): 44-56.
[12]	杨薛钰，陈建平，傅启明，陆悠，吴宏杰. 基于随机方差减小方法的DDPG算法[J]. 计算机工程与应用, 2021, 57(19): 104-111.
[13]	宋浩楠，赵刚，王兴芬. 融合知识表示和深度强化学习的知识推理方法[J]. 计算机工程与应用, 2021, 57(19): 189-197.
[14]	王科银，石振，杨正才，杨亚会，王思山. 改进强化学习算法应用于移动机器人路径规划[J]. 计算机工程与应用, 2021, 57(18): 270-274.
[15]	张俊，朱庆伟，严俊杰，温波. 改进强化学习算法的UAV室内三维航迹规划[J]. 计算机工程与应用, 2021, 57(16): 175-181.