计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (15): 30-36.DOI: 10.3778/j.issn.1002-8331.2001-0347

• 热点与综述 • 上一篇    下一篇

基于深度强化学习的三维路径规划算法

黄东晋,蒋晨凤,韩凯丽   

  1. 1.上海大学 上海电影学院,上海 200072
    2.上海电影特效工程技术研究中心,上海 200072
  • 出版日期:2020-08-01 发布日期:2020-07-30

3D Path Planning Algorithm Based on Deep Reinforcement Learning

HUANG Dongjin, JIANG Chenfeng, HAN Kaili   

  1. 1.Shanghai Film Academy, Shanghai University, Shanghai 200072, China
    2.Shanghai Engineering Research Center of Motion Picture Special Effects, Shanghai 200072, China
  • Online:2020-08-01 Published:2020-07-30

摘要:

合理的路线选择是智能体三维路径规划研究领域的难点。现有路径规划方法存在不能很好地适应未知地形,避障形式单一等问题。针对这些问题,提出了一种基于LSTM-PPO的智能体三维路径规划算法。利用虚拟射线探测仿真环境,并将收集到的状态空间和动作状态引入长短时记忆网络。通过额外的奖惩函数和好奇心驱动让智能体学会跳跃通过低矮障碍物,避开大型障碍物。利用PPO算法的截断项机制使得规划策略更新的幅度更加优化。实验结果表明,该算法是可行的,能够更加智能合理地选择路线,很好地适应存在多样障碍物的未知环境。

关键词: 深度强化学习, 近端策略优化算法, 路径规划, 复杂未知场景

Abstract:

Reasonable path selection is a difficulty in the field of 3D path planning. The existing 3D path planning methods can not adapt to the unknown terrain, and the obstacle avoidance form is single. In order to solve these problems, a 3D path planning algorithm for agents based on LSTM-PPO is proposed. Virtual ray is designed to detect simulation environment, and the collected state space and action states are introduced into Long Short-Term Memory Networks(LSTM). Through the extra reward function and intrinsic curiosity module, the agent can learn to jump through low obstacles and avoid large obstacles. Using the PPO’s clipped surrogate objective to optimize the update range of planning strategy. The results show that the algorithm is feasible, more intelligent and more reasonable for path planning, and can adapt well to the unknown environment with many obstacles.

Key words: deep reinforcement learning, Proximal Policy Optimization(PPO) algorithm, path planning, complex unknown environment