计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (8): 306-314.DOI: 10.3778/j.issn.1002-8331.2112-0084

• 工程与应用 • 上一篇    下一篇

基于SAC的自动驾驶车辆控制方法应用

宁强,刘元盛,谢龙洋   

  1. 1.北京联合大学 智慧城市学院,北京 100101
    2.北京市智能机械创新设计服务工程技术研究中心,北京 100101
    3.北京联合大学 北京市信息服务工程重点实验室,北京 100101
  • 出版日期:2023-04-15 发布日期:2023-04-15

Application of SAC-Based Autonomous Vehicle Control Method

NING Qiang, LIU Yuansheng, XIE Longyang   

  1. 1.College of Smart City, Beijing Union University, Beijing 100101, China
    2.Beijing Engineering Research Center of Smart Mechanical Innovation Design Service, Beijing 100101, China
    3.Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
  • Online:2023-04-15 Published:2023-04-15

摘要: 为了改善SAC(soft actor critic)算法样本等概率采样以及网络随机初始化造成网络收敛速度慢、训练过程不稳定问题,提出一种结合优先级回放和专家数据的改进算法PE-SAC(priority playback soft actor critic with expert)。该算法依据样本价值将样本池分类,使用专家数据预训练网络,缩小无人车无效探索空间、降低试错次数,有效提升算法学习效率。同时设计一种面向多障碍物的奖励函数增强算法适用性。在CARLA平台进行仿真实验,结果表明所提出方法可以更好地控制无人车在环境中安全行驶,同等训练次数下所得奖励值和收敛速度优于TD3(twin delayed deep deterministic policy gradient algorithm)和SAC算法。最后,结合雷达点云地图与PID(proportional integral derivative)控制方法缩小仿真环境与真实场景差异性,将训练所得模型移植到园区低速无人车中验证算法泛用性。

关键词: 深度强化学习, 无人驾驶控制, 现实场景

Abstract: In order to improve the problem of slow network convergence and unstable training process caused by equal probability sampling of SAC(soft actor critic) algorithm samples and random initialization of the network, an improved algorithm PE-SAC(priority playback soft actor) is proposed that combines priority playback and expert data. The algorithm classifies the sample pool according to the sample value, uses expert data to pre-train the network, reduces the invalid exploration space of unmanned vehicles, reduces the number of trials and errors, and effectively improves the learning efficiency of the algorithm. At the same time, a reward function for multiple obstacles is designed to enhance the applicability of the algorithm. Simulation experiments are carried out on the CARLA platform, and the results show that the proposed method can better control the safe driving of unmanned vehicles in the environment, and the reward value and convergence speed obtained under the same training times are better than TD3(twin delayed deep deterministic policy gradient algorithm) and SAC algorithm. Finally, combined with the radar point cloud map and the PID(proportional integral derivative) control method, the difference between the simulation environment and the real scene is reduced, and the training model is transplanted to the low-speed unmanned vehicle in the park to verify the generality of the algorithm.

Key words: deep reinforcement learning, unmanned driving control, realistic scene