Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (13): 15-19.DOI: 10.3778/j.issn.1002-8331.1812-0321

Previous Articles     Next Articles

Path Planning for Mobile Robot Based on Deep Reinforcement Learning

DONG Yao1,2, GE Yingying1,2, GUO Hongyong1,3, DONG Yongfeng1,2, YANG Chen1,2   

  1. 1.School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
    2.Hebei Provincial Key Laboratory of Big Data Computing, Hebei University of Technology, Tianjin 300401, China
    3.Hebei University of Engineering, Handan, Hebei 056038, China
  • Online:2019-07-01 Published:2019-07-01

基于深度强化学习的移动机器人路径规划

董  瑶1,2,葛莹莹1,2,郭鸿湧1,3,董永峰1,2,杨  琛1,2   

  1. 1.河北工业大学 人工智能与数据科学学院,天津 300401
    2.河北工业大学 河北省大数据计算重点实验室,天津 300401
    3.河北工程大学,河北 邯郸 056038

Abstract: To solve the problem of slow convergence under the basic deep [Q]-Network with which the robot explores the complex and unknown environment, an improved deep double [Q] network algorithm (Improved Dueling Deep Double [Q]-Network, IDDDQN) based on dueling network structure is put forward. The mobile robot can estimate the state-action value function of its three actions through the improved DDQN network, update the network parameters and get the corresponding [Q] value through the training. With the combination of Boltzmann and [ε]-greedy adopted, the mobile robot chooses an optimal action, and reaches the next observation. It can also store the data into experience replay memory through network learning, and train the network with mini-batch data. According to the experiment results, the mobile robot using IDDDQN can quickly adapt to the unknown environment, the convergence speed of IDDDQN is improved, the success rate of reaching the target position adds up to more than three times, and the optimal path can also be gained in an unknown complex environment.

Key words: Deep Double [Q]-Network(DDQN), dueling network, resample experience replay memory, Boltzmann distribution, [ε]-greedy policy

摘要: 为解决传统的深度[Q]网络模型下机器人探索复杂未知环境时收敛速度慢的问题,提出了基于竞争网络结构的改进深度双[Q]网络方法(Improved Dueling Deep Double [Q]-Network,IDDDQN)。移动机器人通过改进的DDQN网络结构对其三个动作的值函数进行估计,并更新网络参数,通过训练网络得到相应的[Q]值。移动机器人采用玻尔兹曼分布与[ε]-greedy相结合的探索策略,选择一个最优动作,到达下一个观察。机器人将通过学习收集到的数据采用改进的重采样优选机制存储到缓存记忆单元中,并利用小批量数据训练网络。实验结果显示,与基本DDQN算法比,IDDDQN训练的机器人能够更快地适应未知环境,网络的收敛速度也得到提高,到达目标点的成功率增加了3倍多,在未知的复杂环境中可以更好地获取最优路径。

关键词: 深度双[Q]网络(DDQN), 竞争网络结构, 重采样优选机制, 玻尔兹曼分布, [&epsilon, ]-greedy策略