计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (18): 270-274.DOI: 10.3778/j.issn.1002-8331.2011-0414

• 工程与应用 • 上一篇    下一篇

改进强化学习算法应用于移动机器人路径规划

王科银,石振,杨正才,杨亚会,王思山   

  1. 1.湖北汽车工业学院 汽车工程学院,湖北 十堰 442002
    2.汽车动力传动与电子控制湖北省重点实验室(湖北汽车工业学院),湖北 十堰 442002
    3.湖北汽车工业学院 汽车工程师学院,湖北 十堰 442002
  • 出版日期:2021-09-15 发布日期:2021-09-13

Path Planning for Mobile Robot Using Improved Reinforcement Learning Algorithm

WANG Keyin, SHI Zhen, YANG Zhengcai, YANG Yahui, WANG Sishan   

  1. 1.School of Automotive Engineering, Hubei University of Automotive Technology, Shiyan, Hubei 442002, China
    2.Key Laboratory of Automotive Power Train and Electronics(Hubei University of Automotive Technology), Shiyan, Hubei 442002, China
    3.Institute of Automotive Engineers, Hubei University of Automotive Technology, Shiyan, Hubei 442002, China
  • Online:2021-09-15 Published:2021-09-13

摘要:

为了解决传统的强化学习算法应用于移动机器人未知环境的路径规划时存在收敛速度慢、迭代次数多、收敛结果不稳定等问题,提出一种改进的Q-learning算法。在状态初始化时引入人工势场法,使得越靠近目标位置状态值越大,从而引导智能体朝目标位置移动,减少算法初始阶段因对环境探索产生的大量无效迭代;在智能体选择动作时改进[ε]-贪婪策略,根据算法的收敛程度动态调整贪婪因子[ε],从而更好地平衡探索和利用之间的关系,在加快算法收敛速度的同时提高收敛结果的稳定性。基于Python的Tkinter标准化库搭建的格栅地图仿真结果表明,改进的Q-learning算法相较于传统算法在路径规划时间上缩短85.1%,收敛前迭代次数减少74.7%,同时算法的收敛结果稳定性也得到了提升。

关键词: 强化学习, 人工势场, 贪婪策略, 移动机器人, 路径规划

Abstract:

In order to solve the problem of slow convergence, large number of iterations and unstable convergence when the mobile robot plans a path in unknown environment by using traditional reinforcement learning algorithm, an improved Q-learning algorithm is proposed. The artificial potential field is used to initialize the state value to make the larger state value closer to the target position, so as to guide the agent to move towards the target position. In the early stage of the algorithm, a large number of invalid iterations due to the environment exploration are reduced. The improved [ε]-greedy strategy is employed as agent’s action selection, the greedy factor [ε] is adjusted dynamically according to the convergence degree of the algorithm so as to balance the relationship between exploration and exploitation better and accelerate the convergence rate of the algorithm and improve the stability of the convergence results. The proposed algorithm is simulated and verified in the grid map based on Python Tkinter standardized library. Simulation results show that, compared with the traditional Q-learning algorithm, the planning time of improved Q-learning algorithm is reduced by 85.1%, the number of iterations is reduced by 74.7% before convergence, and the stability of the convergence results is greatly improved.

Key words: reinforcement learning, artificial potential field, greedy strategy, mobile robots, path planning