计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (18): 316-322.DOI: 10.3778/j.issn.1002-8331.2206-0366

• 工程与应用 • 上一篇    下一篇

移动机器人行人避让策略强化学习研究

王唯鉴,王勇,杨骁,吕宗喆,吴宗毅   

  1. 1.北京机械工业自动化研究所,北京 100120
    2.北自所(北京)科技发展股份有限公司,北京 100120
  • 出版日期:2023-09-15 发布日期:2023-09-15

Research on Reinforcement Learning of Pedestrian Avoidance Approach for Mobile Robots

WANG Weijian, WANG Yong, YANG Xiao, LYU Zongzhe, WU Zongyi   

  1. 1.Beijing Research Institute of Automation for Machinery Industry, Beijing 100120, China
    2.RIAMB (Beijing) Technology Development Co., Ltd., Beijing 100120, China
  • Online:2023-09-15 Published:2023-09-15

摘要: 为提升移动机器人在人员密集场景下的行人避让能力,提出一种深度强化学习行人避让方法。按照强化学习范式对问题进行建模,规定了状态空间、动作空间和奖励函数。使用图卷积网络(graph convolutional networks,GCN)聚合机器人与行人的自身潜在特征以及彼此的关联特征,输出机器人与行人之间的深度交互特征用于状态-动作对的价值估计,同时提取行人之间的深度交互特征用于行人状态预测。应用了一种改进的蒙特卡洛树搜索方法使机器人通过行人状态预测和与环境模拟交互来评估未来[K]步行动的预期收益,选择更有远见的导航路径。实验表明引入行人状态预测以及改进蒙特卡洛树搜索方法使机器人导航时间更短,避让效果更好。提出的方法在开源仿真场景CrowdNav中具备接近SOTA模型的性能,且运行时间更短。

关键词: 移动机器人, 避障算法, 深度强化学习, 图卷积网络, 蒙特卡洛树搜索

Abstract: A deep reinforcement learning approach is proposed to improve the pedestrians avoidance performance of mobile robots while navigating in crowded environments. Firstly, the problem is modeled into a reinforcement learning formulation, in which the state space, action space and reward function are defined. Secondly, graph convolutional networks are used to generate deep interactive features by aggregating the latent features of the robot and pedestrians as well as their pairwise related features, which are used to estimate the value of state-action tuple. Graph convolutional networks are also used to extract deep interactive features among pedestrians to predict pedestrian state in future. At last, an improved Monte Carlo tree search (MCTS) method is adopted to enable the robot to evaluate the expected reward in the coming [K] steps and choose a foresighted navigation path by predicting pedestrian state and performing simulated interaction with the environment. Experiments show that the pedestrian state prediction and the improved MCTS method shorten the robot’s navigation time as well as enhance the avoidance effect. The approach has the performance close to the state-of-the-art method in open-source simulation scenario CrowdNav with less consumption of time.

Key words: mobile robots, obstacle avoidance, deep-reinforcement learning, graph convolutional networks, Monte Carlo tree search(MCTS)