计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (19): 158-166.DOI: 10.3778/j.issn.1002-8331.2307-0009

• 模式识别与人工智能 • 上一篇    下一篇

自注意力机制结合DDPG的机器人路径规划研究

王凤英,陈莹,袁帅,杜利明   

  1. 1. 沈阳建筑大学 计算机科学与工程学院,沈阳  110168
    2. 宿迁学院 信息工程学院,江苏  宿迁  223800
  • 出版日期:2024-10-01 发布日期:2024-09-30

Robot Path Planning Based on Self-Attention Mechanism Combined with DDPG

WANG Fengying, CHEN Ying, YUAN Shuai, DU Liming   

  1. 1. School of Computer Science and Engineering, Shenyang Jianzhu University, Shenyang 110168, China
    2. School of Information Engineering, Suqian University, Suqian, Jiangsu 223800, China
  • Online:2024-10-01 Published:2024-09-30

摘要: 为更好解决深度确定性策略梯度算法在路径规划中存在样本利用率低、奖励稀疏、网络模型稳定速度慢等问题,提出了一种改进DDPG的算法。通过对机器人相机传感器获取图片信息加入自注意力机制,利用Dot-product方法计算图片之间的相关性,能够将较高权重精确聚焦在障碍物信息中。在复杂环境中,由于机器人缺乏经验导致难以获得正反馈的奖励,影响了机器人的探索能力。将DDPG算法与HER结合,提出DDPG-HER算法,有效利用正负反馈使机器人从成功和失败的经历中均可学习到适当奖励。通过Gazebo搭建静态和动态仿真环境进行训练和测试,实验结果表明所提出的算法能显著提高样本利用率,加快网络模型稳定的速度,解决奖励稀疏的问题,使机器人在环境未知的路径规划中能够高效地避开障碍物到达目标点。

关键词: 深度强化学习, 深度确定性策略梯度算法(DDPG), 后见经验算法(HER), 自注意力机制, 机器人路径规划

Abstract: In order to better solve the problems of low sample utilization, sparse reward and slow stability of network model in path planning of depth deterministic strategy gradient algorithm, an improved DDPG algorithm is proposed. By incorporating a self-attention mechanism into the image information obtained from robot camera sensors and using the Dot-product method to calculate the correlation between images, high weights can be accurately focused on obstacle information. In complex environments, it is difficult for robots to obtain positive feedback rewards due to their lack of experience, which affects their exploration ability. Combining DDPG algorithm with HER, a DDPG-HER algorithm is proposed, which effectively utilizes positive and negative feedback to enable robots to learn appropriate rewards from both successful and failed experiences. A static and dynamic simulation environment is built by Gazebo for training and testing. The experimental results show that the proposed algorithm can significantly improve the sample utilization rate, accelerate network model stability, and solve the problem of sparse reward, so that the robot can efficiently avoid obstacles and reach the target point in the path planning with unknown environment.

Key words: deep reinforcement learning, deep deterministic policy gradient (DDPG), hindsight experience replay (HER), self-attention mechanism, robot path planning