计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (14): 86-95.DOI: 10.3778/j.issn.1002-8331.2304-0158

• 模式识别与人工智能 • 上一篇    下一篇

改进行为克隆与DDPG的无人驾驶决策模型

李伟东,黄振柱,何精武,马草原,葛程   

  1. 大连理工大学 汽车工程学院,辽宁 大连 116024
  • 出版日期:2024-07-15 发布日期:2024-07-15

Improved Behavioral Cloning and DDPG’s Driverless Decision Model

LI Weidong, HUANG Zhenzhu, HE Jingwu, MA Caoyuan,GE Cheng   

  1. School of Automotive Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
  • Online:2024-07-15 Published:2024-07-15

摘要: 无人驾驶技术的关键是决策层根据感知环节输入信息做出准确指令。强化学习和模仿学习比传统规则更适用于复杂场景。但以行为克隆为代表的模仿学习存在复合误差问题,使用优先经验回放算法对行为克隆进行改进,提升模型对演示数据集的拟合能力;原DDPG(deep deterministic policy gradient)算法存在探索效率低下问题,使用经验池分离以及随机网络蒸馏技术(random network distillation,RND)对DDPG算法进行改进,提升DDPG算法训练效率。使用改进后的算法进行联合训练,减少DDPG训练前期的无用探索。通过TORCS(the open racing car simulator)仿真平台验证,实验结果表明该方法在相同的训练次数内,能够探索出更稳定的道路保持、速度保持和避障能力。

关键词: 无人驾驶, 强化学习, 模仿学习, 决策算法, TORCS

Abstract: The key to driverless technology is that the decision-making level makes accurate instructions based on the input information of the perception link. Reinforcement learning and imitation learning are better suited for complex scenarios than traditional rules. However, the imitation learning represented by behavioral cloning has the problem of composite error, and this paper uses the priority empirical playback algorithm to improve the behavioral cloning to improve the fitting ability of the model to the demo dataset. The original DDPG (deep deterministic policy gradient) algorithm has the problem of low exploration efficiency, and the experience pool separation and random network distillation (RND) technology are used to improve the DDPG algorithm and improve the training efficiency of DDPG algorithm. The improved algorithm is used for joint training to reduce the useless exploration in the early stage of DDPG training. Verified by TORC (the open racing car simulator) simulation platform, the experimental results show that the proposed method can explore more stable road maintenance, speed maintenance and obstacle avoidance ability within the same number of training times.

Key words: unmanned driving, strengthen learning, imitate learning, decision algorithm, the open racing car simulator (TORCS)