Research on Reinforcement Learning of Pedestrian Avoidance Approach for Mobile Robots

doi:10.3778/j.issn.1002-8331.2206-0366

Abstract

Abstract: A deep reinforcement learning approach is proposed to improve the pedestrians avoidance performance of mobile robots while navigating in crowded environments. Firstly, the problem is modeled into a reinforcement learning formulation, in which the state space, action space and reward function are defined. Secondly, graph convolutional networks are used to generate deep interactive features by aggregating the latent features of the robot and pedestrians as well as their pairwise related features, which are used to estimate the value of state-action tuple. Graph convolutional networks are also used to extract deep interactive features among pedestrians to predict pedestrian state in future. At last, an improved Monte Carlo tree search （MCTS） method is adopted to enable the robot to evaluate the expected reward in the coming [K] steps and choose a foresighted navigation path by predicting pedestrian state and performing simulated interaction with the environment. Experiments show that the pedestrian state prediction and the improved MCTS method shorten the robot’s navigation time as well as enhance the avoidance effect. The approach has the performance close to the state-of-the-art method in open-source simulation scenario CrowdNav with less consumption of time.

Key words: mobile robots, obstacle avoidance, deep-reinforcement learning, graph convolutional networks, Monte Carlo tree search（MCTS）

摘要： 为提升移动机器人在人员密集场景下的行人避让能力，提出一种深度强化学习行人避让方法。按照强化学习范式对问题进行建模，规定了状态空间、动作空间和奖励函数。使用图卷积网络（graph convolutional networks，GCN）聚合机器人与行人的自身潜在特征以及彼此的关联特征，输出机器人与行人之间的深度交互特征用于状态-动作对的价值估计，同时提取行人之间的深度交互特征用于行人状态预测。应用了一种改进的蒙特卡洛树搜索方法使机器人通过行人状态预测和与环境模拟交互来评估未来[K]步行动的预期收益，选择更有远见的导航路径。实验表明引入行人状态预测以及改进蒙特卡洛树搜索方法使机器人导航时间更短，避让效果更好。提出的方法在开源仿真场景CrowdNav中具备接近SOTA模型的性能，且运行时间更短。

关键词: 移动机器人, 避障算法, 深度强化学习, 图卷积网络, 蒙特卡洛树搜索

WANG Weijian, WANG Yong, YANG Xiao, LYU Zongzhe, WU Zongyi. Research on Reinforcement Learning of Pedestrian Avoidance Approach for Mobile Robots[J]. Computer Engineering and Applications, 2023, 59(18): 316-322.

王唯鉴, 王勇, 杨骁, 吕宗喆, 吴宗毅. 移动机器人行人避让策略强化学习研究[J]. 计算机工程与应用, 2023, 59(18): 316-322.

References

[1] 张广帅，韦建军，刘检权，等.移动机器人导航的路径规划策略[J].机电工程技术，2021，50（4）：14-24.
ZHANG G S，WEI J J，LIU J Q，et al.On path planning strategies for navigation of mobile robot[J].Mechanical & Electrical Engineering Technology，2021，50（4）：14-24.
[2] BERG J，GUY S J，LIN M，et al.Reciprocal [n]-body collision avoidance[M]//Robotics research.Berlin，Heidelberg：Springer，2011：3-19.
[3] KIM S，GUY S J，LIU W，et al.Brvo：predicting pedestrian trajectories using velocity-space reasoning[J].The International Journal of Robotics Research，2015，34（2）：201-217.
[4] ALAHI A，GOEL K，RAMANATHAN V，et al.Social LSTM：human trajectory prediction in crowded spaces[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：961-971.
[5] 孙亚圣，姜奇，胡洁，等.基于注意力机制的行人轨迹预测生成模型[J].计算机应用，2019，39（3）：668-674.
SUN Y S，JIANG Q，HU J，et al.Attention mechanism based pedestrian trajectory prediction generation model[J].Jornal of Computer Applications，2019，39（3）：668-674.
[6] CHEN Y F，LIU M，EVERETT M，et al.Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning[C]//2017 IEEE International Conference on Robotics and Automation（ICRA），2017：285-292.
[7] EVERETT M，CHEN Y F，HOW J P.Motion planning among dynamic，decision-making agents with deep reinforcement learning[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2018：3052-3059.
[8] CHEN C，LIU Y，KREISS S，et al.Crowd-robot interaction：crowd-aware robot navigation with attention-based deep reinforcement learning[C]//2019 International Conference on Robotics and Automation（ICRA），2019：6015-6022.
[9] 鲁兴龙.行人环境下的移动机器人自主导航技术研究[D].上海：上海交通大学，2019.
LU X L.Research on autonomous navigation technology of mobile robot in pedestrian environment[D].Shanghai：Shanghai Jiao Tong University，2019.
[10] SUTTON R S，BARTO A G.Reinforcement learning：an introduction[M].[S.l.]：MIT Press，2018.
[11] 徐冰冰，岑科廷，黄俊杰，等.图卷积神经网络综述[J].计算机学报，2020，43（5）：755-780.
XU B B，CEN K T，HUANG J J，et al.A survey on graph convolutional neural network[J].Chinese Journal of Computers，2020，43（5）：755-780.
[12] WANG X，GIRSHICK R，GUPTA A，et al.Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7794-7803.
[13] 李明晓，张恒才，仇培元，等.一种基于模糊长短期神经网络的移动对象轨迹预测算法[J].测绘学报，2018，47（12）：1660-1669.
LI M X，ZHANG H C，QIU P Y，et al.Predicting future locations with deep fuzzy-LSTM network[J].Acta Geodaetica et Cartographica Sinica，2018，47（12）：1660-1669.
[14] SUTTON R S.TD models：modeling the world at a mixture of time scales[M]//Machine learning proceedings 1995.[S.l.]：Morgan Kaufmann，1995：531-539.
[15] SCHRITTWIESER J，ANTONOGLOU I，HUBERT T，et al.Mastering atari，go，chess and shogi by planning with a learned model[J].Nature，2020，588（7839）：604-609.
[16] LIU Y X，GUPTA A，ABBEEL P，et al.Imitation from observation：learning to imitate behaviors from raw video via context translation[C]//2018 IEEE International Conference on Robotics and Automation（ICRA），2018：1118-1125.
[17] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Playing atari with deep reinforcement learning[J].arXiv：1312.5602，2013.
[18] LOSHCHILOV I，HUTTER F.Decoupled weight decay regularization[C]//International Conference on Learning Representations，2018.