改进深度强化学习的室内移动机器人路径规划

doi:10.3778/j.issn.1002-8331.2106-0040

摘要/Abstract

摘要：

为了解决传统深度强化学习在室内未知环境下移动机器人路径规划中存在探索能力差和环境状态空间奖励稀疏的问题，提出了一种基于深度图像信息的改进深度强化学习算法。利用Kinect视觉传感器直接获取的深度图像信息和目标位置信息作为网络的输入，以机器人的线速度和角速度作为下一步动作指令的输出。设计了改进的奖惩函数，提高了算法的奖励值，优化了状态空间，在一定程度上缓解了奖励稀疏的问题。仿真结果表明，改进算法提高了机器人的探索能力，优化了路径轨迹，使机器人有效地避开了障碍物，规划出更短的路径，简单环境下比DQN算法的平均路径长度缩短了21.4%，复杂环境下平均路径长度缩短了11.3%。

关键词: 路径规划, 深度图像信息, Kinect 视觉传感器, 深度强化学习, 奖惩函数, 探索能力

Abstract:

An improved deep reinforcement learning algorithm based on deep image information is proposed in order to solve the problem of poor exploration ability and sparse environment state space of traditional deep reinforcement learning in path planning of the mobile robot in unknown indoor environment. The depth image information and target position information directly obtained by the Kinect visual sensor are used as the input of the network. The linear velocity and angular velocity of the robot are used as the output of the next action command. An improved reward and punishment function is designed to increase the reward value of the algorithm. The state space is optimized. To a certain extent, it alleviates the problem of reward sparsity. The simulation results show that the improved algorithm can improve the exploration ability of the robot and optimize the path trajectory. The robot can effectively avoid obstacles and plan a shorter path. Compared with DQN algorithm, the average path length in simple environment is shortened by 21.4%. The average path length in complex environment is reduced by 11.3%.

Key words: path planning, depth image information, Kinect visual sensor, deep reinforcement learning, reward and punishment function, exploration ability

成怡，郝密密. 改进深度强化学习的室内移动机器人路径规划[J]. 计算机工程与应用, 2021, 57(21): 256-262.

CHENG Yi, HAO Mimi. Path Planning for Indoor Mobile Robot with Improved Deep Reinforcement Learning[J]. Computer Engineering and Applications, 2021, 57(21): 256-262.

参考文献

[1] GAO J，YE W，GUO J.Deep reinforcement learning for indoor mobile robot path planning[J].Sensors，2020，20（19）：5493.
[2] 霍凤财，迟金，黄梓健.移动机器人路径规划算法综述[J].吉林大学学报（信息科学版），2018，36（6）：639-647.
HUO F C，CHI J，HUANG Z J.Review of path planning for mobile robots[J].Journal of Jilin University（Information Science Edition），2018，36（6）：639-647.
[3] LIU Z，LIU H，LU Z.A dynamic fusion path finding algorithm using Delaunay triangulation and improved a-star for mobile robots[J].IEEE Access，2021，9：20602-20621.
[4] 罗强，王海宝，崔小劲.改进人工势场法自主移动机器人路径规划[J].控制工程，2019，26（6）：1091-1098.
LUO Q，WANG H B，CUI X J.Autonomous mobile robot path planning based on improved artificial potential method[J].Control Engineering of China，2019，26（6）：1091-1098.
[5] ZHANG Z，QIAO B，ZHAO W.A predictive path planning algorithm for mobile robot in dynamic environments based on rapidly exploring random tree[J].Arabian Journal for Science and Engineering，2021，46：8223-8232.
[6] 王珂，卜祥津，李瑞峰.景深约束下的深度强化学习机器人路径规划[J].华中科技大学学报（自然科学版），2018，46（12）：77-82.
WANG K，BU X J，LI R F.Path planning for robots based on deep reinforcement learning by depth constraint[J].Journal of Huazhong University of Science and Technology（Natural Science Edition），2018，46（12）：77-82.
[7] HASSELT H V，GUEZ A，SILVER D.deep reinforcement learning with double q-learning[J].arXiv：1509.06461，2015.
[8] DUGULEANA M，MOGAN G.Neural networks based reinforcement learning for mobile robots obstacle avoidance[J].Expert Systems with Applications，2016，62（15）：104-115.
[9] MNIH V，KORAY K，DAVID S.Human-level control through deep reinforcement learning[J].Nature，2015，518（7540）：529-533.
[10] TAI T，LI S，LIU M.A deep-network solution towards model-less obstacle avoidance[C]//International Conference on Intelligent Robots and Systems（IROS），Daejeon，2016：2759-2764.
[11] YU X，WANG P，ZHANG Z.Learning-based end-to-end path planning for lunar rovers with safety constraints[J].Sensors，2021，21（3）：796.
[12] 徐晓苏，袁杰.基于改进强化学习的移动机器人路径规划方法[J].中国惯性技术学报，2019，27（3）：314-320.
XU X S，YUAN J.Path planning for mobile robot based on improved reinforcement learning algorithm[J].Journal of Chinese Inertirl Technology，2019，27（3）：314-320.
[13] HUANG H，DENG X，ZHANG W.Towards multi-modal perception-based navigation：a deep reinforcement learning method[J].IEEE Robotics and Automation Letters，2021，6（3）：4986-4993.
[14] ZHONG J，WANG T，CHENG L.Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics[J].Complex & Intelligent Systems，2021（2）.
[15] OLIMPIYA S，PRITHVIRAJ D，BRADLEY W.Real-time robot path planning from simple to complex obstacle patterns via transfer learning of options[J].Autonomous Robots，2019，43（8）：2071-2093.
[16] LI B，WU Y.Path planning for UAV ground target tracking via deep reinforcement learning[J].IEEE Access，2020，8：29064-29074.
[17] AKLA B，REM A，BR A.Complete coverage path planning using reinforcement learning for Tetromino based cleaning and maintenance robot[J].Automation in Construction，2020，112.
[18] 桂林，武小悦.部分可观测马尔可夫决策过程算法综述[J].系统工程与电子技术，2008（6）：1058-1064.
GUI L，WU X Y.Survey of algorithms for partially observable Markov decision processes[J].Systems Engineering and Electronics，2008（6）：1058-1064.
[19] JIANG L，HUANG H，DING Z.Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge[J].IEEE/CAA Journal of Automatica Sinica，2020，7（4）：1179-1189.
[20] QUIGLEY M，CONLEY K，GERKEY B.ROS：an open-source robot operating system[C]//International Conference on Robotics and Automation，2009.
[21] KOENING N，HOWARD A.Design and use paradigms for Gazebo，an open-source multi-robot simulator[C]//IEEE/RSJ International Conference on Intelligent Robots & Systems，2004.