计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (7): 266-275.DOI: 10.3778/j.issn.1002-8331.2009-0518

• 工程与应用 • 上一篇    下一篇

改进深度Q网络的无人车换道决策算法研究

张鑫辰,张军,刘元盛,路铭,谢龙洋   

  1. 1.北京联合大学 北京市信息服务工程重点实验室,北京 100101 
    2.北京联合大学 机器人学院,北京 100101
    3.北京联合大学 应用科技学院,北京 100101
  • 出版日期:2022-04-01 发布日期:2022-04-01

Research on Autonomous Vehicle Lane Change Strategy Algorithm Based on Improved Deep Q Network

ZHANG Xinchen, ZHANG Jun, LIU Yuansheng, LU Ming, XIE Longyang   

  1. 1.Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
    2.College of Robotics, Beijing Union University, Beijing 100101, China
    3.College of Applied Science and Technology, Beijing Union University, Beijing 100101, China
  • Online:2022-04-01 Published:2022-04-01

摘要: 深度Q网络(deep Q network,DQN)模型已被广泛应用于高速公路场景中无人车换道决策,但传统的DQN存在过估计且收敛速度较慢的问题。 针对此问题提出了基于改进深度Q网络的无人车换道决策模型。将得到的状态值分别输入到两个结构相同而参数更新频率不同的神经网络中,以此来减少经验样本之间的相关性,然后将隐藏层输出的无人车状态信息同时输入到状态价值函数(state value function)流和动作优势函数(action advantage function)流中,从而更准确地得到模型中每个动作对应的[Q]值,再采用优先级经验回放(prioritized experience replay,PER)的方式从经验回放单元中抽取经验样本,增加经验回放单元中重要样本的利用率。在NGSIM数据集搭建的实验场景中进行模型的训练和测试,实验结果表明,改进的深度Q网络模型可以使无人车更好地理解环境中的状态变化,提高了换道决策成功率的同时网络的收敛速度也得到提升。

关键词: 无人车, 换道决策, 状态价值函数, 动作优势函数, 优先级经验回放

Abstract: The deep Q network(DQN) model has been widely used in autonomous vehicle lane change strategy in highway scenes, but the traditional DQN has the problems of overestimation and slow convergence speed. Aiming at these problems, an autonomous vehicle lane change strategy model based on improved deep Q network is proposed. Firstly, the obtained state values are input into two neural networks with the same structure but different parameter update frequencies to reduce the correlation between experience samples. Then the autonomous vehicle state information output by the hidden layer are input into the state value function and action advantage function at the same time, thus the [Q] value corresponding to each action in the model can be obtained more accurately. Furthermore, the prioritized experience replay(PER) method is adopted to extract experience samples from the experience playback unit to increase the utilization rate of important samples. Finally, the proposed model is trained and tested in the experimental scene built by the NGSIM dataset. The experimental results show thatthe improved deep Q network model can enable autonomous vehicles to understand the state changes in the environment better than other DQN models, and improve the success rate of lane changing strategy and the convergence speed of the network.

Key words: autonomous vehicle, lane change strategy, state value function, action advantage function, prioritized experience replay