面向复杂交通场景的自动驾驶运动规划模型

doi:10.3778/j.issn.1002-8331.2304-0223

摘要/Abstract

摘要： 针对现有自动驾驶运动规划方法存在未能有效利用长时间连续的时间特征以及在复杂交通场景中成功率低的问题，提出一种基于Transformer的复杂交通场景自动驾驶运动规划模型。该方法以GPT-2为基础模型，通过对离线强化学习进行时序建模，能够有效表征离线强化学习模型中车辆的状态、动作、奖励数据长时间的依赖关系，让模型能够更有效地从历史规划数据中学习，提高在复杂交通场景中规划的准确性和安全性。实验运用MetaDrive模拟器进行仿真测试，结果表明在汇入主路、进入环岛等复杂交通场景中取得了高达93%的成功率，比现有先进的行为克隆算法、策略约束算法、基于双延迟深度确定性策略的行为克隆算法的成功率分别高20、19、13个百分点，说明该方法相比对比方法能够更有效地从质量不高的数据集中学习驾驶策略，具有更好的泛化性能和鲁棒性。

关键词: Transformer, 离线强化学习, 复杂交通场景, 自动驾驶, 运动规划

Abstract: To address the problems that existing autonomous driving motion planning methods fail to effectively utilize the long-term continuous time features and the problem of low success rate in complex traffic scenarios, a Transformer-based autonomous driving motion planning model for complex traffic scenes is proposed. The method uses GPT-2 as the base model, and through temporal modeling of offline reinforcement learning, it can effectively characterize the dependencies of the state, action, and reward data of vehicles in the offline reinforcement learning model over a long period of time, allowing the model to effectively learn more from historical planning data and improve the accuracy and safety in complex traffic scenarios. The experiments are simulated and tested in the MetaDrive simulator, and the results show that the model has achieved a success rate of up to 93% in complex traffic scenarios such as merging into main roads and entering traffic circles, which are 20, 19, and 13 percentage points higher than the success rates of the existing state-of-the-art method including behavior cloning, batch-constrained deep Q-learning (BCQ), and twin delayed deep deterministic policy gradient with behavioral cloning (TD3+BC), respectively. This indicates that the proposed method is more effective to learn driving strategies from low quality datasets and with better generalization performance and robustness compared with other comparative methods.

Key words: Transformer, offline reinforcement learning, complex traffic scenarios, autonomous driving, motion planning

任佳佳, 柳寅奎, 胡学敏, 向宸, 罗显志. 面向复杂交通场景的自动驾驶运动规划模型[J]. 计算机工程与应用, 2024, 60(15): 91-100.

REN Jiajia, LIU Yinkui, HU Xuemin, XIANG Chen, LUO Xianzhi. Motion Planning Model for Autonomous Driving in Complex Traffic Scenarios[J]. Computer Engineering and Applications, 2024, 60(15): 91-100.

参考文献

[1] LEVINE S, KUMAR A, TUCKER G, et al. Offline reinforcement learning: tutorial, review, and perspectives on open problems[J]. arXiv:2005.01643, 2020.
[2] OKUMURA B, JAMES M R, KANZAWA Y, et al. Challenges in perception and decision making for intelligent automotive vehicles: a case study[J]. IEEE Transactions on Intelligent Vehicles, 2016, 1(1): 20-32.
[3] WANG P, GAO S, LI L, et al. Obstacle avoidance path planning design for autonomous driving vehicles based on an improved artificial potential field algorithm[J]. Energies, 2019, 12(12): 2342.
[4] CUI Q, DING R, WEI C, et al. A hierarchical framework of emergency collision avoidance amid surrounding vehicles in highway driving[J]. Control Engineering Practice, 2021, 109: 104751.
[5] ZHANG Y, ZHANG J, ZHANG J, et al. A novel learning framework for sampling-based motion planning in autonomous driving[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 1202-1209.
[6] AHN J, KIM M, PARK J. Vision-based autonomous driving for unstructured environments using imitation learning[J]. arXiv:2202.10002, 2022.
[7] ZHANG L, ZHANG R, WU T, et al. Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(12): 5435-5444.
[8] SHU K, YU H, CHEN X, et al. Autonomous driving at intersections: a behavior-oriented critical-turning-point approach for decision making[J]. IEEE/ASME Transactions on Mechatronics, 2021, 27(1): 234-244.
[9] HANG P, LV C, HUANG C, et al. Cooperative decision making of connected automated vehicles at multi-lane merging zone: a coalitional game approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(4): 3829-3841.
[10] FUJIMOTO S, MEGER D, PRECUP D. Off-policy deep reinforcement learning without exploration[C]//International Conference on Machine Learning, 2019: 2052-2062.
[11] FUJIMOTO S, GU S S. A minimalist approach to offline reinforcement learning[C]//Advances in Neural Information Processing Systems, 2021: 20132-20145.
[12] KUMAR A, ZHOU A, TUCKER G, et al. Conservative q-learning for offline reinforcement learning[C]//Advances in Neural Information Processing Systems, 2020: 1179-1191.
[13] YU T, THOMAS G, YU L, et al. Mopo: model-based offline policy optimization[C]//Advances in Neural Information Processing Systems, 2020: 14129-14142.
[14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[15] LEE J D M C K, TOUTANOVA K. Pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018.
[16] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[17] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[18] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of 16th European Conference on Computer Vision, Glasgow, UK, August 23-28, 2020. [S.l.]: Springer International Publishing, 2020: 213-229.
[19] 张英俊, 白小辉, 谢斌红. CNN-Transformer特征融合多目标跟踪算法[J]. 计算机工程与应用, 2024, 60(2): 180-190.
ZHANG Y J, BAI X H, XIE B H. Multi-object tracking algorithm based on CNN-Transformer feature fusion[J]. Computer Engineering and Applications, 2024, 60(2): 180-190.
[20] 方思严, 刘斌. 小波分频自注意力Transformer图像去雨网络[J]. 计算机工程与应用, 2024, 60(6): 259-273.
FANG S Y, LIU B. Wavelet frequency division self-attention Transformer image deraining network[J]. Computer Engineering and Applications, 2024, 60(6): 259-273.
[21] CHEN L, LU K, RAJESWARAN A, et al. Decision transformer: reinforcement learning via sequence modeling[C]// Advances in Neural Information Processing Systems, 2021: 15084-15097.
[22] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9.
[23] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Advances in Neural Information Processing Systems, 2020: 1877-1901.
[24] VAN SEIJEN H, FATEMI M, ROMOFF J, et al. Hybrid reward architecture for reinforcement learning[C]//Advances in Neural Information Processing Systems, 2017.
[25] LI Q, PENG Z, FENG L, et al. Metadrive: composing diverse driving scenarios for generalizable reinforcement learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022: 3461-3475.
[26] XUE Z, PENG Z, LI Q, et al. Guarded policy optimization with imperfect online demonstrations[C]//Proceedings of the Eleventh International Conference on Learning Representations, 2022.
[27] LI Q, PENG Z, WU H, et al. Human-AI shared control via policy dissection[C]//Advances in Neural Information Processing Systems, 2022: 8853-8867.
[28] FU J, KUMAR A, NACHUM O, et al. D4rl: datasets for deep data-driven reinforcement learning[J]. arXiv:2004.07219, 2020.
[29] LIU H, HUANG Z, MO X, et al. Augmenting reinforcement learning with Transformer-based scene representation learning for decision-making of autonomous driving[J]. arXiv:2208.12263, 2022.
[30] FANG X, ZHANG Q, GAO Y, et al. Offline reinforcement learning for autonomous driving with real world driving data[C]//2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), 2022: 3417-3422.
[31] SHI T, CHEN D, CHEN K, et al. Offline reinforcement learning for autonomous driving with safety and exploration enhancement[J]. arXiv:2110.07067, 2021.