分层强化学习综述

doi:10.3778/j.issn.1002-8331.2010-0038

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (3): 72-79.DOI: 10.3778/j.issn.1002-8331.2010-0038

分层强化学习综述

赖俊，魏竞毅，陈希亮

陆军工程大学指挥控制工程学院，南京 210007

出版日期:2021-02-01 发布日期:2021-01-29

Overview of Hierarchical Reinforcement Learning

LAI Jun, WEI Jingyi, CHEN Xiliang

College of Command Information System, Army Engineering University, Nanjing 210007, China

Online:2021-02-01 Published:2021-01-29

摘要/Abstract

摘要：

近年来强化学习愈发体现其强大的学习能力，2017年AlphaGo在围棋上击败世界冠军，同时在复杂竞技游戏星际争霸2和DOTA2中人类的顶尖战队也败于AI之手，但其自身又存在着自身的弱点，在不断的发展中瓶颈逐渐出现。分层强化学习因为能够解决其维数灾难问题，使得其在环境更为复杂，动作空间更大的环境中表现出更加优异的处理能力，对其的研究在近几年不断升温。对强化学习的基本理论进行简要介绍，对Option、HAMs、MAXQ这3种经典分层强化学习算法进行介绍，之后对近几年在分层的思想下提出的分层强化学习算法从3个方面进行综述，并对其进行分析，讨论了分层强化学习的发展前景和挑战。

关键词: 分层强化学习, 子策略共享, 多层分层结构, 自动分层

Abstract:

In recent years, reinforcement learning has increasingly reflected its strong learning ability. In 2017, AlphaGo beat the world champion in go. Meanwhile, in the complex competitive games StarCraft 2 and dota2, the top human teams are also defeated by AI. However, it has its own weaknesses, and the bottleneck gradually appears in the continuous development. Hierarchical reinforcement learning can solve the problem of dimension disaster, which makes it show more excellent processing ability in the environment with more complex environment and larger action space. This paper briefly introduces the basic theory of reinforcement learning. It introduces three classical hierarchical reinforcement learning algorithms, option, hams and MAXQ. It summarizes and analyzes the hierarchical reinforcement learning algorithm proposed in recent years under the idea of stratification from three aspects. It discusses the development prospects and challenges of hierarchical reinforcement learning.

Key words: hierarchical reinforcement learning, subpolicy sharing, multi-layer hierarchical structure, automatic stratification

赖俊，魏竞毅，陈希亮. 分层强化学习综述[J]. 计算机工程与应用, 2021, 57(3): 72-79.

LAI Jun, WEI Jingyi, CHEN Xiliang. Overview of Hierarchical Reinforcement Learning[J]. Computer Engineering and Applications, 2021, 57(3): 72-79.

441

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	5	0	436

来源	本网站	其他网站

次数	367	74
比例	83%	17%

摘要

324

最新录用	在线预览	正式出版

2	0	322

	来源	本网站

	次数	324
	比例	100%

[1]	胡坤，余雪丽，李志. 一种改进的自动分层算法BMAXQ[J]. 计算机工程与应用, 2011, 47(30): 1-3.
[2]	胡明辉,殷苌茗,李立云. 基于ACCA的Option自动生成算法[J]. 计算机工程与应用, 2008, 44(19): 39-40.
[3]	程晓北,沈晶,刘海波,顾国昌,张国印. 分层强化学习研究进展[J]. 计算机工程与应用, 2008, 44(13): 1-5.

分层强化学习综述

Overview of Hierarchical Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐 0

Metrics