计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (16): 129-134.DOI: 10.3778/j.issn.1002-8331.1704-0427

• 模式识别与人工智能 • 上一篇    下一篇

引入势场及陷阱搜索的强化学习路径规划算法

董培方1,张志安1,梅新虎2,朱  朔1   

  1. 1.南京理工大学 机械工程学院,南京 210094
    2.南京理工大学 计算机科学与技术学院,南京 210094
  • 出版日期:2018-08-15 发布日期:2018-08-09

Reinforcement learning path planning algorithm based on gravitational potential field and trap search

DONG Peifang1, ZHANG Zhi’an1, MEI Xinhu2, ZHU Shuo1   

  1. 1.School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
    2.School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China
  • Online:2018-08-15 Published:2018-08-09

摘要: 移动机器人在复杂环境中移动难以得到较优的路径,基于马尔可夫过程的Q学习(Q-learning)算法能通过试错学习取得较优的路径,但这种方法收敛速度慢,迭代次数多,且试错方式无法应用于真实的环境中。在Q-learning算法中加入引力势场作为初始环境先验信息,在其基础上对环境进行陷阱区域逐层搜索,剔除凹形陷阱区域[Q]值迭代,加快了路径规划的收敛速度。同时取消对障碍物的试错学习,使算法在初始状态就能有效避开障碍物,适用于真实环境中直接学习。利用python及pygame模块建立复杂地图,验证加入初始引力势场和陷阱搜索的改进Q-learning算法路径规划效果。仿真实验表明,改进算法能在较少的迭代次数后,快速有效地到达目标位置,且路径较优。

关键词: 路径规划, 强化学习, 人工势场, 陷阱搜索, Q值初始化

Abstract: It is difficult to obtain a better path for mobile robot in complex environment. The Q-learning algorithm based on the Markov process can achieve better path through learning by trial and error. But this algorithm has a slow convergence speed and a large number of iterations, the trial and error approach cannot be applied in the real environment. Search trap area on the basis of adding gravitational potential field as the initial environment priori information in the Q-learning algorithm, remove [Q] value iteration in concave trap area which speeding up the convergence rate of path planning. At the same time, cancel the trial and error learning to the obstacle, the algorithm avoids obstacles effectively in the initial state. It can be applied in the real environment. Python and pygame modules are used to build complex maps to verify the path planning effect of the improved Q-learning algorithm with the addition of initial gravitational potential field and trap search. The simulation results show that the improved algorithm can reach the target position quickly and effectively after fewer iterations.

Key words: path planning, reinforcement learning, artificial potential field, trap search, Q value initialization