Application of SAC-Based Autonomous Vehicle Control Method

doi:10.3778/j.issn.1002-8331.2112-0084

Abstract

Abstract: In order to improve the problem of slow network convergence and unstable training process caused by equal probability sampling of SAC（soft actor critic） algorithm samples and random initialization of the network, an improved algorithm PE-SAC（priority playback soft actor） is proposed that combines priority playback and expert data. The algorithm classifies the sample pool according to the sample value, uses expert data to pre-train the network, reduces the invalid exploration space of unmanned vehicles, reduces the number of trials and errors, and effectively improves the learning efficiency of the algorithm. At the same time, a reward function for multiple obstacles is designed to enhance the applicability of the algorithm. Simulation experiments are carried out on the CARLA platform, and the results show that the proposed method can better control the safe driving of unmanned vehicles in the environment, and the reward value and convergence speed obtained under the same training times are better than TD3（twin delayed deep deterministic policy gradient algorithm） and SAC algorithm. Finally, combined with the radar point cloud map and the PID（proportional integral derivative） control method, the difference between the simulation environment and the real scene is reduced, and the training model is transplanted to the low-speed unmanned vehicle in the park to verify the generality of the algorithm.

Key words: deep reinforcement learning, unmanned driving control, realistic scene

摘要： 为了改善SAC（soft actor critic）算法样本等概率采样以及网络随机初始化造成网络收敛速度慢、训练过程不稳定问题，提出一种结合优先级回放和专家数据的改进算法PE-SAC（priority playback soft actor critic with expert）。该算法依据样本价值将样本池分类，使用专家数据预训练网络，缩小无人车无效探索空间、降低试错次数，有效提升算法学习效率。同时设计一种面向多障碍物的奖励函数增强算法适用性。在CARLA平台进行仿真实验，结果表明所提出方法可以更好地控制无人车在环境中安全行驶，同等训练次数下所得奖励值和收敛速度优于TD3（twin delayed deep deterministic policy gradient algorithm）和SAC算法。最后，结合雷达点云地图与PID（proportional integral derivative）控制方法缩小仿真环境与真实场景差异性，将训练所得模型移植到园区低速无人车中验证算法泛用性。

关键词: 深度强化学习, 无人驾驶控制, 现实场景

NING Qiang, LIU Yuansheng, XIE Longyang. Application of SAC-Based Autonomous Vehicle Control Method[J]. Computer Engineering and Applications, 2023, 59(8): 306-314.

宁强, 刘元盛, 谢龙洋. 基于SAC的自动驾驶车辆控制方法应用[J]. 计算机工程与应用, 2023, 59(8): 306-314.

References

[1] 公安部发布上半年全国机动车和驾驶人最新数据[J].道路交通管理，2020（8）：8.
The Ministry of Public Security released the latest data on motor vehicles and drivers nationwide in the first half of the year[J].Road Traffic Management，2020（8）：8.
[2] 王猛，李民强，余道洋.基于改进Q学习算法的无人水面艇动态环境路径规划[J].仪表技术，2020（4）：17-20.
WANG Meng，LI Minqiang，YU Daoyang.Dynamic environmental path planning for unmanner surface vehicle based on improved Q-learning algorithm[J].Instrumentation Technology，2020（4）：17-20.
[3] 饶宁，许华，宋佰霖.融合有效方差置信上界的Q学习智能干扰决策算法[J].哈尔滨工业大学学报，2022，54（5）：162-170.
RAO Ning，XU Hua，SONG Bailin.Q-learning intelligent jamming decision algorithm based on efficient upper confidence bound variance.[J].Journal of Harbin Institute of Technology，2022，54（5）：162-170.
[4] HOEL C J，WOLFF K，LAINE L.Automated speed and lane change decision making using deep reinforcement learning[C]//2018 21st International Conference on Intelligent Transportation Systems（ITSC），2018.
[5] LILLICRAP T P，HUNT J J，PRITZEL A，et al.Continuous control with deep reinforcement learning[J].arXiv：1509.
02971，2015.
[6] 王丙琛，司怀伟，谭国真.基于深度强化学习的自动驾驶车控制算法研究[J].郑州大学学报（工学版），2020，41（4）：41-45.
WANG Bingchen，SI Huaiwei，TAN Guozhen.Research on autopilot control algorithm based on deep reinforcement learning[J].Journal of Zhengzhou University（Engineering Science），2020，41（4）：41-45.
[7] 陈亮，梁宸，张景异，等.Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法[J].控制与决策，2021，36（1）：75-82.
CHEN Liang，LIANG Chen，ZHANG Jingyi，et al.A multi-agent reinforcement learning algorithm based on improved DDPG in Actor-Critic framework[J].Control and Decision，2021，36（1）：75-82.
[8] 孙雄风，黄珍，陈志军，等.基于改进GAN的端到端自动驾驶图像生成方法[J].交通信息与安全，2021，39（5）：50-58.
SUN Xiongfeng，HUANG Zhen，CHEN Zhijun，et al.An image generation method for automated driving based on improved GAN[J].Journal of Transport Information and Safety，2021，39（5）：50-58.
[9] 黄煜梵，彭诺蘅，林艳，等.基于SAC强化学习的车联网频谱资源动态分配[J].计算机工程，2021，47（9）：34-43.
HUANG Yufan，PENG Nuoheng，LIN Yan，et al.Dynamic spectrum resource allocation in Internet of vehicles based on SAC reinforcement learning[J].Computer Engineering，2021，47（9）：34-43.
[10] 单麒源，张智豪，张耀心，等.基于SAC算法的矿山应急救援智能车快速避障控制[J].黑龙江科技大学学报，2021，31（1）：14-20.
SHAN Qiyuan，ZHANG Zhihao，ZHANG Yaoxin，et al.High speed obstacle avoidance control of mine emergency rescue intelligent vehicle based on SAC algorithm[J].Journal of Heilongjiang University of Science & Technology，2021，31（1）：14-20.
[11] 王忠立，王浩，申艳，等.一种多感知多约束奖励机制的驾驶策略学习方法[J].吉林大学学报（工学版），2022，52（11）：2718-2727.
WANG Zhongli，WANG Hao，SHEN Yan，et al.A driving decision-making approach based on multi-sensing and multi-constraints reward function[J].Journal of Jilin University（Engineering and Technology Edition），2022，52（11）：2718-2727.
[12] 张新钰，高洪波，赵建辉，等.基于深度学习的自动驾驶技术综述[J].清华大学学报（自然科学版），2018，58（4）：438-444.
ZHANG Xinyu，GAO Hongbo，ZHAO Jianhui，et al.Overview of deep learning intelligent driving methods[J].Journal of Tsinghua University（Science and Technology），2018，58（4）：438-444.
[13] MINSKY M L.Theory of neural-analog reinforcement systems and its application to the brain-model problem[D].Princeton University，1954.
[14] BERGHOUT S，VERBITSKIY E.On regularity of functions of Markov chains[J].Stochastic Processes and their Applications，2021，134：29-54.
[15] HAARNOJA T，ZHOU A，HARTIKAINEN K，et al.Soft actor-critic algorithms and applications[J].arXiv：1812.
05905，2018.
[16] LI S，YAN Y，REN J，et al.A sample-efficient Actor-Critic algorithm for recommendation diversification[J].Chinese Journal of Electronics，2020，29（1）：89-96.
[17] DOSOVITSKIY A，ROS G，CODEVILLA F，et al.CARLA：an open urban driving simulator[C]//Conference on Robot Learning，2017：1-16.
[18] 高振海，于桐，孙天骏，等.面向无人驾驶的数据采集与分析系统研究综述[J].汽车技术，2021（6）：1-11.
GAO Z H，YU T，SUN T J，et al.Review on data acquisition and analysis system for autonomous vehicles[J].Automobile Technology，2021（6）：1-11.
[19] LI G，LI Y，CHEN H，et al.Fractional-order controller for course-keeping of underactuated surface vessels based on frequency domain specification and improved particle swarm optimization algorithm[J].Applied Sciences，2022，12（6）：3139.
[20] 邓伟文，李江坤，任秉韬，等.面向自动驾驶的仿真场景自动生成方法综述[J].中国公路学报，2022，35（1）：316-333.
DENG Weiwen，LI Jiangkun，REN Bingtao，et al.A survey on automatic simulation scenario generation methods for autonomous driving[J].China Journal of Highway and Transport，2022，35（1）：316-333.