Robot Path Planning Based on Self-Attention Mechanism Combined with DDPG

doi:10.3778/j.issn.1002-8331.2307-0009

Abstract

Abstract: In order to better solve the problems of low sample utilization, sparse reward and slow stability of network model in path planning of depth deterministic strategy gradient algorithm, an improved DDPG algorithm is proposed. By incorporating a self-attention mechanism into the image information obtained from robot camera sensors and using the Dot-product method to calculate the correlation between images, high weights can be accurately focused on obstacle information. In complex environments, it is difficult for robots to obtain positive feedback rewards due to their lack of experience, which affects their exploration ability. Combining DDPG algorithm with HER, a DDPG-HER algorithm is proposed, which effectively utilizes positive and negative feedback to enable robots to learn appropriate rewards from both successful and failed experiences. A static and dynamic simulation environment is built by Gazebo for training and testing. The experimental results show that the proposed algorithm can significantly improve the sample utilization rate, accelerate network model stability, and solve the problem of sparse reward, so that the robot can efficiently avoid obstacles and reach the target point in the path planning with unknown environment.

Key words: deep reinforcement learning, deep deterministic policy gradient (DDPG), hindsight experience replay (HER), self-attention mechanism, robot path planning

摘要： 为更好解决深度确定性策略梯度算法在路径规划中存在样本利用率低、奖励稀疏、网络模型稳定速度慢等问题，提出了一种改进DDPG的算法。通过对机器人相机传感器获取图片信息加入自注意力机制，利用Dot-product方法计算图片之间的相关性，能够将较高权重精确聚焦在障碍物信息中。在复杂环境中，由于机器人缺乏经验导致难以获得正反馈的奖励，影响了机器人的探索能力。将DDPG算法与HER结合，提出DDPG-HER算法，有效利用正负反馈使机器人从成功和失败的经历中均可学习到适当奖励。通过Gazebo搭建静态和动态仿真环境进行训练和测试，实验结果表明所提出的算法能显著提高样本利用率，加快网络模型稳定的速度，解决奖励稀疏的问题，使机器人在环境未知的路径规划中能够高效地避开障碍物到达目标点。

关键词: 深度强化学习, 深度确定性策略梯度算法（DDPG）, 后见经验算法（HER）, 自注意力机制, 机器人路径规划

WANG Fengying, CHEN Ying, YUAN Shuai, DU Liming. Robot Path Planning Based on Self-Attention Mechanism Combined with DDPG[J]. Computer Engineering and Applications, 2024, 60(19): 158-166.

王凤英, 陈莹, 袁帅, 杜利明. 自注意力机制结合DDPG的机器人路径规划研究[J]. 计算机工程与应用, 2024, 60(19): 158-166.

References

[1] YUN S C, PARASURAMAN S, GANAPATHY V. Dynamic path planning algorithm in mobile robot navigation[C]//Proceedings of the 2011 IEEE Symposium on Industrial Electronics and Applications, 2011: 364-369.
[2] DORIGO M, BIRATTARI M, STUTZLE T. Ant colony optimization[J]. IEEE Computational Intelligence Magazine, 2006, 1(4): 28-39.
[3] TSE P W, LANG S, LEUNG K C, et al. Design of a navigation system for a household mobile robot using neural networks[C]//Proceedings of the 1998 IEEE International Joint Conference on Neural Networks, IEEE World Congress on Computational Intelligence, 1998: 2151-2156.
[4] JARADAT M A K, AL-ROUSAN M, QUADAN L. Reinforcement based mobile robot navigation in dynamic environment[J]. Robotics and Computer-Integrated Manufacturing, 2011, 27(1): 135-149.
[5] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[J]. arXiv:1312.5602, 2013.
[6] TAI L, LIU M. Towards cognitive exploration through deep reinforcement learning for mobile robots[J]. arXiv:1610.01733, 2016.
[7] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016: 2094-2100.
[8] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. arXiv:1509.02971, 2015.
[9] PENG X B, BERSETH G, VAN DE PANNE M. Terrain-adaptive locomotion skills using deep reinforcement learning[J]. ACM Transactions on Graphics (TOG), 2016, 35(4): 1-12.
[10] BELLMAN R E. Adaptive control processes: a guided tour[M]. New Jersey, USA: Princeton University Press, 2015.
[11] LOBOS-TSUNEKAWA K, LEIVA F, RUIZ-DEL-SOLAR J. Visual navigation for biped humanoid robots using deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2018, 3(4): 3247-3254.
[12] HU X D, HUANG X X, HU T J, et al. MRDDGG algorithms for path planning of free-floating space robot[C]//Proceedings of the 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), 2018: 1079-1082.
[13] 李明振. 室内全向移动机器人路径规划研究[D]. 南昌: 华东交通大学, 2022.
LI M Z. Research on path planning of indoor omnidirectional mobile robot[D]. Nanchang: East China Jiaotong University, 2022.
[14] 郝崇清, 任博恒, 赵庆鹏, 等. 基于改进的DDPG算法的蛇形机器人路径规划方法[J]. 河北科技大学学报, 2023, 44(2): 165-176.
HAO C Q, REN B H, ZHAO Q P, et al. Path planning method of snake-like robot based on improved DDPG algorithm[J]. Journal of Hebei University of Science and Technology, 2023, 44(2): 165-176.
[15] 陈佳盼, 郑敏华. 基于深度强化学习的机器人操作行为研究综述[J]. 机器人, 2022, 44(2): 236-256.
CHEN J P, ZHENG M H. A survey of robot manipulation behavior research based on deep reinforcement learning[J]. Robot, 2022, 44(2): 236-256.
[16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. arXiv:1706.03762, 2017.
[17] THOMAS D G, OLSHANSKYI D, KRUEGER K, et al. Interpretable UAV collision avoidance using deep reinforcement learning[J]. arXiv:2105.12254, 2021.
[18] PAN X, GE C, LU R, et al. On the integration of self-attention and convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 815-825.
[19] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hindsight experience replay[C]//Advances in Neural Information Processing Systems, 2017: 5048-5058.