Self-Driving Via Improved DDPG Algorithm

doi:10.3778/j.issn.1002-8331.1806-0324

Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (10): 264-270.DOI: 10.3778/j.issn.1002-8331.1806-0324

Self-Driving Via Improved DDPG Algorithm

ZHANG Bin1, HE Ming1，2, CHEN Xiliang1, WU Chunxiao1, LIU Bin1, ZHOU Bo1

1.College of Command and Control Engineering, The Army Engineering University of PLA, Nanjing 210002, China
2.Institute of Network Information, Academy of Systems Engineering, Academy of Military Sciences, Beijing 100071, China

Online:2019-05-15 Published:2019-05-13

改进DDPG算法在自动驾驶中的应用

张斌1，何明1，2，陈希亮1，吴春晓1，刘斌1，周波1

1.中国人民解放军陆军工程大学指挥控制工程学院，南京 210002
2.军事科学院系统工程研究院网络信息研究所，北京 100071

Abstract

Abstract: As a classic algorithm of deep reinforcement learning, the Deep Deterministic Policy Gradient algorithm（DDPG） has great advantage on the aspect of continuous control problems and is applied in self-driving area. In order to solve the problems of low training efficiency and large amount of illegal driving policy, an improved algorithm called failure experience correction DDPG is proposed. The algorithm divides experience pool into success experience pool and failure experience pool, selects failure experience according to the driving performance, controlls the brake pedal and acceleration pedal via one neural network output, and explores unknown policy through normal distribution noisy. Through the simulation on the TORCS platform, experimental results show that the proposed algorithm can significantly improve the training efficiency and reduce the illegal driving policy to zero.

Key words: deep reinforcement learning, self-driving, DDPG algorithm, experience pool dividing, TORCS

摘要： 深度确定性策略梯度算法（Deep Deterministic Policy Gradient，DDPG）作为深度强化学习中的经典算法，在连续控制问题上有着较大的优势，被应用于自动驾驶领域。针对DDPG缺少策略动作过滤导致的非法策略比例较高引起的训练效率低、收敛速度慢等问题，提出基于失败经验纠错的深度确定性策略梯度算法。通过分离经验缓存池，根据驾驶表现选择失败数据训练，并将策略网络单输出转化为油门和刹车控制量，通过正态分布噪声改善探索策略。TORCS平台仿真实验表明，所提算法相对于DDPG算法与DQN（Deep Q-learning Network）算法，训练效率明显提升，非法驾驶策略降低为0。

关键词: 深度强化学习, 自动驾驶, DDPG算法, 经验缓存分离, TORCS

ZHANG Bin1, HE Ming1，2, CHEN Xiliang1, WU Chunxiao1, LIU Bin1, ZHOU Bo1. Self-Driving Via Improved DDPG Algorithm[J]. Computer Engineering and Applications, 2019, 55(10): 264-270.

张斌1，何明1，2，陈希亮1，吴春晓1，刘斌1，周波1. 改进DDPG算法在自动驾驶中的应用[J]. 计算机工程与应用, 2019, 55(10): 264-270.

[1]	MA Zhihao, ZHU Xiangbin. Research on Quasi-hyperbolic Momentum Gradient for Adversarial Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(24): 90-99.
[2]	LI Baoshuai, YE Chunming. Job Shop Scheduling Problem Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(23): 248-254.
[3]	CHENG Yi, HAO Mimi. Path Planning for Indoor Mobile Robot with Improved Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(21): 256-262.
[4]	KUANG Liqun, LI Siyuan, FENG Li, HAN Xie, XU Qingyu. Application of Deep Reinforcement Learning Algorithm on Intelligent Military Decision System [J]. Computer Engineering and Applications, 2021, 57(20): 271-278.
[5]	KONG Songtao, LIU Chichi, SHI Yong, XIE Yi, WANG Kun. Review of Application Prospect of Deep Reinforcement Learning in Intelligent Manufacturing [J]. Computer Engineering and Applications, 2021, 57(2): 49-59.
[6]	ZHANG Rongxia, WU Changxu, SUN Tongchao, ZHAO Zengshun. Progress on Deep Reinforcement Learning in Path Planning [J]. Computer Engineering and Applications, 2021, 57(19): 44-56.
[7]	YANG Xueyu, CHEN Jianping, FU Qiming, LU You, WU Hongjie. Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method [J]. Computer Engineering and Applications, 2021, 57(19): 104-111.
[8]	SONG Haonan, ZHAO Gang, WANG Xingfen. Knowledge Reasoning Method Combining Knowledge Representation with Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(19): 189-197.
[9]	YANG Tong, QIN Jin. Adaptive ε-greedy Strategy Based on Average Episodic Cumulative Reward [J]. Computer Engineering and Applications, 2021, 57(11): 148-155.
[10]	SUN Yu, CAO Lei, CHEN Xiliang, XU Zhixiong, LAI Jun. Overview of Multi-Agent Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2020, 56(5): 13-24.
[11]	HAN Daoqi, ZHANG Junyao, ZHOU Yuhang, LIU Qing. Research on Intelligent Trader Model Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2020, 56(21): 145-153.
[12]	LI Yue, SHAO Zhenzhou, ZHAO Zhendong, SHI Zhiping, GUAN Yong. Design of Reward Function in Deep Reinforcement Learning for Trajectory Planning [J]. Computer Engineering and Applications, 2020, 56(2): 226-232.
[13]	LAI Jun, RAO Rui. Application of Deep Reinforcement Learning in Indoor UAV Target Search [J]. Computer Engineering and Applications, 2020, 56(17): 156-160.
[14]	HUANG Dongjin, JIANG Chenfeng, HAN Kaili. 3D Path Planning Algorithm Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2020, 56(15): 30-36.
[15]	WANG Xinsheng, ZHANG Guiling. Monocular Depth Estimation Based on Convolutional Neural Network [J]. Computer Engineering and Applications, 2020, 56(13): 143-149.

Self-Driving Via Improved DDPG Algorithm

改进DDPG算法在自动驾驶中的应用

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics