计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (10): 16-19.

• 博士论坛 • 上一篇    下一篇

倒立摆系统中强化学习的极限环问题

郑 宇,罗四维,吕子昂   

  1. 北京交通大学 计算机与信息技术学院,北京 100044
  • 收稿日期:2007-11-21 修回日期:2007-12-25 出版日期:2008-04-01 发布日期:2008-04-01
  • 通讯作者: 郑 宇

Limit cycles in inverted pendulum system by reinforcement learning

ZHENG Yu,LUO Si-wei,LV Zi-ang   

  1. School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
  • Received:2007-11-21 Revised:2007-12-25 Online:2008-04-01 Published:2008-04-01
  • Contact: ZHENG Yu

摘要: 倒立摆系统是强化学习的一种重要的应用领域。首先分析指出在倒立摆系统中,常用的强化学习算法存在着极限环问题,算法无法正确收敛、控制策略不稳定。但是由于在简单的一级倒立摆系统中算法的控制策略不稳定的现象还不明显,因此极限环问题常常被忽视。针对强化学习算法中极限环问题,提出基于动作连续性准则的强化学习算法。算法采用修正强化信号和改进探索策略的方法克服极限环对倒立摆系统的影响。将提出的算法用于二级倒立摆的实际系统控制中,实验结果证明算法不仅能成功控制倒立摆,而且可以保持控制策略的稳定。

关键词: 极限环, 强化学习, 倒立摆

Abstract: An important application of reinforcement learning in control systems is inverted pendulum.This paper points out that the common reinforcement learning algorithm will get into the limit cycles in the inverted pendulum system,which makes the algorithm incorrectly converge and destroy the stabilization of the optimal control policy.But the limit cycles problem is often ignored in many literatures as the goal of their algorithms is only to keep the pendulum stand in a given time.To overcome the limit cycles problem,this paper proposes a new reinforcement learning algorithm based on action continuity criterion.The algorithm revises the reinforcement signal and improves the exploration policy to overcome the negative effect of limit cycles in the inverted pendulum system.Simulation and actual control results of the double inverted pendulum system show the algorithm can not only control inverted pendulum successfully,but also keep the control policy stable.

Key words: limit cycles, reinforcement learning, inverted pendulum