Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (8): 33-44.DOI: 10.3778/j.issn.1002-8331.2112-0082

• Research Hotspots and Reviews • Previous Articles     Next Articles

Review of Research on Approximate Reinforcement Learning Algorithms

SI Yanna, PU Jiexin, SUN Lifan   

  1. 1.School of Information Engineering, Henan University of Science and Technology, Luoyang, Henan 471023, China
    2.School of Information and Communication Engineering, University of Electronic Science and Technology, Chengdu 611731, China
  • Online:2022-04-15 Published:2022-04-15

近似强化学习算法研究综述

司彦娜,普杰信,孙力帆   

  1. 1.河南科技大学 信息工程学院,河南 洛阳 471023
    2.电子科技大学 信息与通信工程学院,成都 611731

Abstract: Reinforcement learning(RL) is one of the most important techniques for artificial intelligence(AI). However, traditional tabular reinforcement learning is difficult to deal with control problems with large scale or continuous space. Approximate reinforcement learning is inspired by the idea of function approximation to parameterize the value function or strategy function, and obtains the optimal strategy indirectly through parameter optimization. It has been widely used in video games, Go game, robot control, etc. and obtained remarkable performance. In view of this, this paper reviews the research status and application progress of approximate reinforcement learning algorithms. Firstly, the basic theory of approximate reinforcement learning is introduced. Then the classical algorithms of approximate reinforcement learning are classified and expounded, including some corresponding improvement methods. Finally, the research progress of approximate reinforcement learning in robotics is summarized, and some major problems are summarized to provide reference for future research.

Key words: reinforcement learning, continuous space, value function approximation, direct policy search, policy gradient

摘要: 强化学习用于解决无模型情况下的优化决策问题,是实现人工智能的重要技术之一,但传统的表格型强化学习方法难以处理具有大规模、连续空间的控制问题。近似强化学习受到函数逼近思想的启发,对价值函数或策略函数参数化表示,通过参数优化间接获得最优行为策略,在视频游戏、棋类对抗及机器人控制等领域应用效果显著。基于此,对近似强化学习算法的研究现状与应用进展进行了梳理和综述。介绍了近似强化学习相关的基础理论;分类总结了近似强化学习的经典算法及一些相应的改进方法;概述了近似强化学习在机器人控制领域的研究进展,并总结了当前面临的若干主要问题,为后续的研究提供参考。

关键词: 强化学习, 连续空间, 值函数近似, 直接策略搜索, 策略梯度