计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (15): 91-100.DOI: 10.3778/j.issn.1002-8331.2304-0223

• 模式识别与人工智能 • 上一篇    下一篇

面向复杂交通场景的自动驾驶运动规划模型

任佳佳,柳寅奎,胡学敏,向宸,罗显志   

  1. 湖北大学 人工智能学院,武汉 430062
  • 出版日期:2024-08-01 发布日期:2024-07-30

Motion Planning Model for Autonomous Driving in Complex Traffic Scenarios

REN Jiajia, LIU Yinkui, HU Xuemin, XIANG Chen, LUO Xianzhi   

  1. School of Artificial Intelligence, Hubei University, Wuhan 430062, China
  • Online:2024-08-01 Published:2024-07-30

摘要: 针对现有自动驾驶运动规划方法存在未能有效利用长时间连续的时间特征以及在复杂交通场景中成功率低的问题,提出一种基于Transformer的复杂交通场景自动驾驶运动规划模型。该方法以GPT-2为基础模型,通过对离线强化学习进行时序建模,能够有效表征离线强化学习模型中车辆的状态、动作、奖励数据长时间的依赖关系,让模型能够更有效地从历史规划数据中学习,提高在复杂交通场景中规划的准确性和安全性。实验运用MetaDrive模拟器进行仿真测试,结果表明在汇入主路、进入环岛等复杂交通场景中取得了高达93%的成功率,比现有先进的行为克隆算法、策略约束算法、基于双延迟深度确定性策略的行为克隆算法的成功率分别高20、19、13个百分点,说明该方法相比对比方法能够更有效地从质量不高的数据集中学习驾驶策略,具有更好的泛化性能和鲁棒性。

关键词: Transformer, 离线强化学习, 复杂交通场景, 自动驾驶, 运动规划

Abstract: To address the problems that existing autonomous driving motion planning methods fail to effectively utilize the long-term continuous time features and the problem of low success rate in complex traffic scenarios, a Transformer-based autonomous driving motion planning model for complex traffic scenes is proposed. The method uses GPT-2 as the base model, and through temporal modeling of offline reinforcement learning, it can effectively characterize the dependencies of the state, action, and reward data of vehicles in the offline reinforcement learning model over a long period of time, allowing the model to effectively learn more from historical planning data and improve the accuracy and safety in complex traffic scenarios. The experiments are simulated and tested in the MetaDrive simulator, and the results show that the model has achieved a success rate of up to 93% in complex traffic scenarios such as merging into main roads and entering traffic circles, which are 20, 19, and 13 percentage points higher than the success rates of the existing state-of-the-art method including behavior cloning, batch-constrained deep Q-learning (BCQ), and twin delayed deep deterministic policy gradient with behavioral cloning (TD3+BC), respectively. This indicates that the proposed method is more effective to learn driving strategies from low quality datasets and with better generalization performance and robustness compared with other comparative methods.

Key words: Transformer, offline reinforcement learning, complex traffic scenarios, autonomous driving, motion planning