基于深度强化学习的三维路径规划算法

doi:10.3778/j.issn.1002-8331.2001-0347

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (15): 30-36.DOI: 10.3778/j.issn.1002-8331.2001-0347

基于深度强化学习的三维路径规划算法

黄东晋，蒋晨凤，韩凯丽

1.上海大学上海电影学院，上海 200072
2.上海电影特效工程技术研究中心，上海 200072

出版日期:2020-08-01 发布日期:2020-07-30

3D Path Planning Algorithm Based on Deep Reinforcement Learning

HUANG Dongjin, JIANG Chenfeng, HAN Kaili

1.Shanghai Film Academy, Shanghai University, Shanghai 200072, China
2.Shanghai Engineering Research Center of Motion Picture Special Effects, Shanghai 200072, China

Online:2020-08-01 Published:2020-07-30

摘要/Abstract

摘要：

合理的路线选择是智能体三维路径规划研究领域的难点。现有路径规划方法存在不能很好地适应未知地形，避障形式单一等问题。针对这些问题，提出了一种基于LSTM-PPO的智能体三维路径规划算法。利用虚拟射线探测仿真环境，并将收集到的状态空间和动作状态引入长短时记忆网络。通过额外的奖惩函数和好奇心驱动让智能体学会跳跃通过低矮障碍物，避开大型障碍物。利用PPO算法的截断项机制使得规划策略更新的幅度更加优化。实验结果表明，该算法是可行的，能够更加智能合理地选择路线，很好地适应存在多样障碍物的未知环境。

关键词: 深度强化学习, 近端策略优化算法, 路径规划, 复杂未知场景

Abstract:

Reasonable path selection is a difficulty in the field of 3D path planning. The existing 3D path planning methods can not adapt to the unknown terrain, and the obstacle avoidance form is single. In order to solve these problems, a 3D path planning algorithm for agents based on LSTM-PPO is proposed. Virtual ray is designed to detect simulation environment, and the collected state space and action states are introduced into Long Short-Term Memory Networks（LSTM）. Through the extra reward function and intrinsic curiosity module, the agent can learn to jump through low obstacles and avoid large obstacles. Using the PPO’s clipped surrogate objective to optimize the update range of planning strategy. The results show that the algorithm is feasible, more intelligent and more reasonable for path planning, and can adapt well to the unknown environment with many obstacles.

Key words: deep reinforcement learning, Proximal Policy Optimization（PPO） algorithm, path planning, complex unknown environment

黄东晋，蒋晨凤，韩凯丽. 基于深度强化学习的三维路径规划算法[J]. 计算机工程与应用, 2020, 56(15): 30-36.

HUANG Dongjin, JIANG Chenfeng, HAN Kaili. 3D Path Planning Algorithm Based on Deep Reinforcement Learning[J]. Computer Engineering and Applications, 2020, 56(15): 30-36.

[1]	槐创锋，郭龙，贾雪艳，张子昊. 改进A*算法与动态窗口法的机器人动态路径规划[J]. 计算机工程与应用, 2021, 57(8): 244-248.
[2]	廖列法，李浩瀚，李帅，朱合隆，李志军. 结合Winner-Take-All的足球机器人控制策略研究[J]. 计算机工程与应用, 2021, 57(7): 136-143.
[3]	朱佳莹，高茂庭. 融合粒子群与改进蚁群算法的AUV路径规划算法[J]. 计算机工程与应用, 2021, 57(6): 267-273.
[4]	刘建宇，范平清. 基于改进的RRT*-connect算法机械臂路径规划[J]. 计算机工程与应用, 2021, 57(6): 274-278.
[5]	王迪，李彩虹，郭娜，刘国名，高腾腾. 基于模糊势场法的移动机器人局部路径规划[J]. 计算机工程与应用, 2021, 57(6): 212-218.
[6]	蒋林，方东君，雷斌，李维刚. 单目视觉移动机器人导航算法研究现状及趋势[J]. 计算机工程与应用, 2021, 57(5): 1-9.
[7]	马向华，张谦. 改进蚁群算法在机器人路径规划上的研究[J]. 计算机工程与应用, 2021, 57(5): 210-215.
[8]	马志豪，朱响斌. 拟双曲动量梯度的对抗深度强化学习研究[J]. 计算机工程与应用, 2021, 57(24): 90-99.
[9]	杨凌耀，张爱华，张洁，宋季强. 栅格地图环境下机器人速度势实时路径规划[J]. 计算机工程与应用, 2021, 57(24): 290-295.
[10]	李宝帅，叶春明. 深度强化学习算法求解作业车间调度问题[J]. 计算机工程与应用, 2021, 57(23): 248-254.
[11]	王琛，茅健. 基于时间窗模型的双向机器人路径规划方法[J]. 计算机工程与应用, 2021, 57(23): 287-294.
[12]	赵伟，吴子英. 双层优化A*算法与动态窗口法的动态路径规划[J]. 计算机工程与应用, 2021, 57(22): 295-303.
[13]	成怡，郝密密. 改进深度强化学习的室内移动机器人路径规划[J]. 计算机工程与应用, 2021, 57(21): 256-262.
[14]	张子然，黄卫华，陈阳，章政，李梓远. 基于双向搜索的改进蚁群路径规划算法[J]. 计算机工程与应用, 2021, 57(21): 270-277.
[15]	涂睿，王文格，卢成阳. 移动机器人实时采样路径重规划[J]. 计算机工程与应用, 2021, 57(20): 157-163.

基于深度强化学习的三维路径规划算法

3D Path Planning Algorithm Based on Deep Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics