计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (19): 316-322.DOI: 10.3778/j.issn.1002-8331.2208-0422

• 工程与应用 • 上一篇    下一篇

好奇心蒸馏双Q网络移动机器人路径规划方法

张凤,顾琦然,袁帅   

  1. 沈阳建筑大学 电气与控制工程学院,沈阳 110168
  • 出版日期:2023-10-01 发布日期:2023-10-01

Path Planning Method for Mobile Robot Based on Curiosity Distillation Double Q-Network

ZHANG Feng, GU Qiran, YUAN Shuai   

  1. School of Electrical and Control Engineering, Shenyang Jianzhu University, Shenyang 110168, China
  • Online:2023-10-01 Published:2023-10-01

摘要: 针对移动机器人的路径规划中DQN算法存在过估计、样本利用率低、奖励稀疏等,从而影响机器人获取最优路径的问题,提出基于好奇心蒸馏模块竞争架构的双Q网络(curiosity distillation module dueling deep double Q-network prioritized experience replay,CDM-D3QN-PER)方法。该方法以D3QN为基础,在输入端添加长短时记忆网络(long short term memory,LSTM)处理雷达和相机的信息,降低过估计的影响,获得更有利的环境信息;采用优先经验回放机制(prioritized experience replay,PER)作为采样方法,使样本得到充分利用,提高样本利用率;引入好奇心蒸馏模块(curiosity distillation module,CDM),缓解奖励稀疏的问题。通过仿真实验与DQN、DDQN、D3QN相比,CDM-D3QN-PER算法训练的机器人到达目标点的次数明显增加,为DQN算法的3倍。该算法使奖励值得到提升,加快了收敛速度,能够在复杂的未知环境中获得最优路径。

关键词: DQN算法, D3QN算法, 好奇心蒸馏模块, 长短时记忆网络(LSTM), 最优路径

Abstract: Aiming at the problem of overestimation, low sample utilization and sparse reward of DQN algorithm in mobile robot path planning, an end-to-end path planning method based on improved deep reinforcement learning is proposed, namely the curiosity distillation module dueling deep double Q-network prioritized experience replay(CDM-D3QN-PER). This method is based on D3QN to reduce the adverse effects of overestimation. Long short term memory(LSTM) is added to the input to process the information of radar and camera to obtain more favorable environmental information. It uses prioritized experience replay(PER) as sampling method to make full use of samples and improve sample utilization,and the curiosity distillation module(CDM ) is introduced to alleviate the problem of reward sparsity to some extent. The experimental results show that compared with DQN, DDQN and D3QN, the number of robots reaching the target point trained by CDM-D3QN-PER algorithm is significantly increased, and it is three times that of DQN algorithm. The algorithm makes reward worthy of promotion, network convergence speed is improved, in unknown complex environment can better obtain the optimal path.

Key words: deep Q-network(DQN)algorithm, D3QN algorithm, curiosity distillation module, long short term memory(LSTM), optimal path