引入势场及陷阱搜索的强化学习路径规划算法

doi:10.3778/j.issn.1002-8331.1704-0427

计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (16): 129-134.DOI: 10.3778/j.issn.1002-8331.1704-0427

引入势场及陷阱搜索的强化学习路径规划算法

董培方1，张志安1，梅新虎2，朱朔1

1.南京理工大学机械工程学院，南京 210094
2.南京理工大学计算机科学与技术学院，南京 210094

出版日期:2018-08-15 发布日期:2018-08-09

Reinforcement learning path planning algorithm based on gravitational potential field and trap search

DONG Peifang1, ZHANG Zhi’an1, MEI Xinhu2, ZHU Shuo1

1.School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
2.School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China

Online:2018-08-15 Published:2018-08-09

摘要/Abstract

摘要： 移动机器人在复杂环境中移动难以得到较优的路径，基于马尔可夫过程的Q学习（Q-learning）算法能通过试错学习取得较优的路径，但这种方法收敛速度慢，迭代次数多，且试错方式无法应用于真实的环境中。在Q-learning算法中加入引力势场作为初始环境先验信息，在其基础上对环境进行陷阱区域逐层搜索，剔除凹形陷阱区域[Q]值迭代，加快了路径规划的收敛速度。同时取消对障碍物的试错学习，使算法在初始状态就能有效避开障碍物，适用于真实环境中直接学习。利用python及pygame模块建立复杂地图，验证加入初始引力势场和陷阱搜索的改进Q-learning算法路径规划效果。仿真实验表明，改进算法能在较少的迭代次数后，快速有效地到达目标位置，且路径较优。

关键词: 路径规划, 强化学习, 人工势场, 陷阱搜索, Q值初始化

Abstract: It is difficult to obtain a better path for mobile robot in complex environment. The Q-learning algorithm based on the Markov process can achieve better path through learning by trial and error. But this algorithm has a slow convergence speed and a large number of iterations, the trial and error approach cannot be applied in the real environment. Search trap area on the basis of adding gravitational potential field as the initial environment priori information in the Q-learning algorithm, remove [Q] value iteration in concave trap area which speeding up the convergence rate of path planning. At the same time, cancel the trial and error learning to the obstacle, the algorithm avoids obstacles effectively in the initial state. It can be applied in the real environment. Python and pygame modules are used to build complex maps to verify the path planning effect of the improved Q-learning algorithm with the addition of initial gravitational potential field and trap search. The simulation results show that the improved algorithm can reach the target position quickly and effectively after fewer iterations.

Key words: path planning, reinforcement learning, artificial potential field, trap search, Q value initialization

董培方1，张志安1，梅新虎2，朱朔1. 引入势场及陷阱搜索的强化学习路径规划算法[J]. 计算机工程与应用, 2018, 54(16): 129-134.

DONG Peifang1, ZHANG Zhi’an1, MEI Xinhu2, ZHU Shuo1. Reinforcement learning path planning algorithm based on gravitational potential field and trap search[J]. Computer Engineering and Applications, 2018, 54(16): 129-134.

[1]	张鑫，张席. 优先状态估计的双深度Q网络[J]. 计算机工程与应用, 2021, 57(8): 78-83.
[2]	槐创锋，郭龙，贾雪艳，张子昊. 改进A*算法与动态窗口法的机器人动态路径规划[J]. 计算机工程与应用, 2021, 57(8): 244-248.
[3]	廖列法，李浩瀚，李帅，朱合隆，李志军. 结合Winner-Take-All的足球机器人控制策略研究[J]. 计算机工程与应用, 2021, 57(7): 136-143.
[4]	朱佳莹，高茂庭. 融合粒子群与改进蚁群算法的AUV路径规划算法[J]. 计算机工程与应用, 2021, 57(6): 267-273.
[5]	刘建宇，范平清. 基于改进的RRT*-connect算法机械臂路径规划[J]. 计算机工程与应用, 2021, 57(6): 274-278.
[6]	王迪，李彩虹，郭娜，刘国名，高腾腾. 基于模糊势场法的移动机器人局部路径规划[J]. 计算机工程与应用, 2021, 57(6): 212-218.
[7]	蒋林，方东君，雷斌，李维刚. 单目视觉移动机器人导航算法研究现状及趋势[J]. 计算机工程与应用, 2021, 57(5): 1-9.
[8]	马向华，张谦. 改进蚁群算法在机器人路径规划上的研究[J]. 计算机工程与应用, 2021, 57(5): 210-215.
[9]	王晓，唐伦，贺小雨，陈前斌. 基于深度强化学习的服务功能链多维资源优化[J]. 计算机工程与应用, 2021, 57(4): 68-76.
[10]	赖俊，魏竞毅，陈希亮. 分层强化学习综述[J]. 计算机工程与应用, 2021, 57(3): 72-79.
[11]	杨凌耀，张爱华，张洁，宋季强. 栅格地图环境下机器人速度势实时路径规划[J]. 计算机工程与应用, 2021, 57(24): 290-295.
[12]	马志豪，朱响斌. 拟双曲动量梯度的对抗深度强化学习研究[J]. 计算机工程与应用, 2021, 57(24): 90-99.
[13]	李宝帅，叶春明. 深度强化学习算法求解作业车间调度问题[J]. 计算机工程与应用, 2021, 57(23): 248-254.
[14]	王琛，茅健. 基于时间窗模型的双向机器人路径规划方法[J]. 计算机工程与应用, 2021, 57(23): 287-294.
[15]	赵伟，吴子英. 双层优化A*算法与动态窗口法的动态路径规划[J]. 计算机工程与应用, 2021, 57(22): 295-303.

引入势场及陷阱搜索的强化学习路径规划算法

Reinforcement learning path planning algorithm based on gravitational potential field and trap search

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics