改进行为克隆与DDPG的无人驾驶决策模型

doi:10.3778/j.issn.1002-8331.2304-0158

摘要/Abstract

摘要： 无人驾驶技术的关键是决策层根据感知环节输入信息做出准确指令。强化学习和模仿学习比传统规则更适用于复杂场景。但以行为克隆为代表的模仿学习存在复合误差问题，使用优先经验回放算法对行为克隆进行改进，提升模型对演示数据集的拟合能力；原DDPG（deep deterministic policy gradient）算法存在探索效率低下问题，使用经验池分离以及随机网络蒸馏技术（random network distillation，RND）对DDPG算法进行改进，提升DDPG算法训练效率。使用改进后的算法进行联合训练，减少DDPG训练前期的无用探索。通过TORCS（the open racing car simulator）仿真平台验证，实验结果表明该方法在相同的训练次数内，能够探索出更稳定的道路保持、速度保持和避障能力。

关键词: 无人驾驶, 强化学习, 模仿学习, 决策算法, TORCS

Abstract: The key to driverless technology is that the decision-making level makes accurate instructions based on the input information of the perception link. Reinforcement learning and imitation learning are better suited for complex scenarios than traditional rules. However, the imitation learning represented by behavioral cloning has the problem of composite error, and this paper uses the priority empirical playback algorithm to improve the behavioral cloning to improve the fitting ability of the model to the demo dataset. The original DDPG (deep deterministic policy gradient) algorithm has the problem of low exploration efficiency, and the experience pool separation and random network distillation (RND) technology are used to improve the DDPG algorithm and improve the training efficiency of DDPG algorithm. The improved algorithm is used for joint training to reduce the useless exploration in the early stage of DDPG training. Verified by TORC (the open racing car simulator) simulation platform, the experimental results show that the proposed method can explore more stable road maintenance, speed maintenance and obstacle avoidance ability within the same number of training times.

Key words: unmanned driving, strengthen learning, imitate learning, decision algorithm, the open racing car simulator (TORCS)

李伟东, 黄振柱, 何精武, 马草原, 葛程. 改进行为克隆与DDPG的无人驾驶决策模型[J]. 计算机工程与应用, 2024, 60(14): 86-95.

LI Weidong, HUANG Zhenzhu, HE Jingwu, MA Caoyuan, GE Cheng. Improved Behavioral Cloning and DDPG’s Driverless Decision Model[J]. Computer Engineering and Applications, 2024, 60(14): 86-95.

参考文献

[1] 吕海鹏. 基于强化学习的高速公路自动驾驶决策方法研究[D]. 长春: 吉林大学, 2022.
LV H P. Research on decision-making method of highway autonomous driving based on reinforcement learning[D]. Changchun: Jilin University, 2022.
[2] 陈广福. 基于强化学习的高速公路CAVs协同驾驶决策研究[D]. 广州: 广东工业大学, 2022.
CHEN G F. Research on cooperative driving decision of highway CAVs based on reinforcement learning[D]. Guangzhou: Guangdong University of Technology, 2022.
[3] 吴昊天, 牟康伟, 王江东. 多维恶劣场景下基于有限状态机的决策控制方法研究[J]. 质量与认证, 2021(11): 51-54.
WU H T, MOU K W, WANG J D. Research on decision control method based on finite state machine in multidimensional edge cases[J]. Quality and Certification, 2021(11): 51-54.
[4] SYED U, SCHAPIRE R E. A reduction from apprenticeship learning to classification[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems, 2010: 2253-2261.
[5] YI X, CODEVILLA F, GURRAM A, et al. Multimodal end-to-end autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 23(1): 537-547.
[6] ROSS S, BAGNELL D. Efficient reductions for imitation learning[C]//Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010: 661-668.
[7] ROSS S, GORDON G, BAGNELL D. A reduction of imitation learning and structured prediction to no-regret online learning[C]//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 2011: 627-635.
[8] NG A Y, RUSSELL S. Algorithms for inverse reinforcement learning[C]//Proceedings of the 17th International Conference on Machine Learning, 2000: 663-670.
[9] KIM K, GU Y H, SONG J M, et al. Domain adaptive imitation learning[C]//Proceedings of the International Conference on Machine Learning, 2020: 5286-5295.
[10] 杨瑞阳, 金蓓弘. 基于模仿学习的自动驾驶智能体构建[J]. 人工智能, 2022(4): 30-39.
YANG R Y, JIN B H. Construction of autonomous driving agent based on imitation learning[J]. Artificial Intelligence, 2022(4): 30-39.
[11] 万星. 基于深度强化学习的车辆自动驾驶拟人决策[D]. 大连: 大连理工大学, 2021.
WAN X. Anthropomorphic decision-making for automated driving vehicle based on deep reinforcement learning theory[D]. Dalian: Dalian University of Technology, 2021.
[12] 罗鹏. 基于深度强化学习的智能车驾驶行为决策研究[D]. 武汉: 武汉理工大学, 2021.
LUO P. Research on driving behavior decision of intelligent vehicles based on reinforcement learning[D]. Wuhan: Wuhan University of Technology, 2021.
[13] LIU Y X, GUPTA A, ABBEEL P, et al. Imitation from observation: learning to imitate behaviors from raw video via context translation[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018: 1118-1125.
[14] CAO Z, XU S B, JIAO X, et al. Trustworthy safety improvement for autonomous driving using reinforcement learning[J]. Transportation Research Part C: Emerging Technologies, 2022, 138: 103656.
[15] CHAE H, KANG C M, KIM B D, et al. Autonomous braking system via deep reinforcement learning[C]//Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems, 2017: 1-6.
[16] 张斌, 何明, 陈希亮, 等. 改进DDPG算法在自动驾驶中的应用[J]. 计算机工程与应用, 2019, 55(10): 264-270.
ZHANG B, HE M, CHEN X L, et al. Self-driving via improved DDPG algorithm[J]. Computer Engineering and Applications, 2019, 55(10): 264-270.
[17] 高振海, 闫相同, 高菲, 等. 仿驾驶员DDPG汽车纵向自动驾驶决策方法[J]. 汽车工程, 2021, 43(12): 1737-1744.
GAO Z H, YAN X T, GAO F, et al. A driver-like decision-making method for longitudinal autonomous driving based on DDPG[J]. Automotive Engineering, 2021, 43(12): 1737-1744.
[18] 张明恒, 吕新飞, 万星, 等. 基于WGAIL-DDPG(λ)的车辆自动驾驶决策模型[J]. 大连理工大学学报, 2022, 62(1): 77-84.
ZHANG M H, LV X F, WAN X, et al. Decision model for automatic vehicle driving based on WGAIL-DDPG(λ)[J]. Journal of Dalian University of Technology, 2022, 62(1): 77-84.
[19] POMERLEAU D A. ALVINN: an autonomous land vehicle in a neural network[C]//Proceedings of the 1st International Conference on Neural Information Processing Systems, 1988: 305-313.
[20] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[J]. arXiv:1511.0595 2, 2015.
[21] BURDA Y, EDWARDS H, STORKEY A, et al. Exploration by random network distillation[J]. arXiv:1810.12894, 2018.
[22] PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-driven exploration by self-supervised prediction[C]//Proceedings of the International Conference on Machine Learning, 2017: 2778-2787.
[23] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Massachusetts: MIT Press, 2018.
[24] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. arXiv:1509. 02971, 2015.
[25] WYMANN B, ESPIé E, GUIONNEAU C, et al. Torcs, the open racing car simulator[J/OL]. (2013-12-19)[2023-01-10].http://torcs.sourceforge.net.