改进深度Q网络的无人车换道决策算法研究

doi:10.3778/j.issn.1002-8331.2009-0518

摘要/Abstract

摘要： 深度Q网络（deep Q network，DQN）模型已被广泛应用于高速公路场景中无人车换道决策，但传统的DQN存在过估计且收敛速度较慢的问题。针对此问题提出了基于改进深度Q网络的无人车换道决策模型。将得到的状态值分别输入到两个结构相同而参数更新频率不同的神经网络中，以此来减少经验样本之间的相关性，然后将隐藏层输出的无人车状态信息同时输入到状态价值函数（state value function）流和动作优势函数（action advantage function）流中，从而更准确地得到模型中每个动作对应的[Q]值，再采用优先级经验回放（prioritized experience replay，PER）的方式从经验回放单元中抽取经验样本，增加经验回放单元中重要样本的利用率。在NGSIM数据集搭建的实验场景中进行模型的训练和测试，实验结果表明，改进的深度Q网络模型可以使无人车更好地理解环境中的状态变化，提高了换道决策成功率的同时网络的收敛速度也得到提升。

关键词: 无人车, 换道决策, 状态价值函数, 动作优势函数, 优先级经验回放

Abstract: The deep Q network（DQN） model has been widely used in autonomous vehicle lane change strategy in highway scenes, but the traditional DQN has the problems of overestimation and slow convergence speed. Aiming at these problems, an autonomous vehicle lane change strategy model based on improved deep Q network is proposed. Firstly, the obtained state values are input into two neural networks with the same structure but different parameter update frequencies to reduce the correlation between experience samples. Then the autonomous vehicle state information output by the hidden layer are input into the state value function and action advantage function at the same time, thus the [Q] value corresponding to each action in the model can be obtained more accurately. Furthermore, the prioritized experience replay（PER） method is adopted to extract experience samples from the experience playback unit to increase the utilization rate of important samples. Finally, the proposed model is trained and tested in the experimental scene built by the NGSIM dataset. The experimental results show thatthe improved deep Q network model can enable autonomous vehicles to understand the state changes in the environment better than other DQN models, and improve the success rate of lane changing strategy and the convergence speed of the network.

Key words: autonomous vehicle, lane change strategy, state value function, action advantage function, prioritized experience replay

张鑫辰, 张军, 刘元盛, 路铭, 谢龙洋. 改进深度Q网络的无人车换道决策算法研究[J]. 计算机工程与应用, 2022, 58(7): 266-275.

ZHANG Xinchen, ZHANG Jun, LIU Yuansheng, LU Ming, XIE Longyang. Research on Autonomous Vehicle Lane Change Strategy Algorithm Based on Improved Deep Q Network[J]. Computer Engineering and Applications, 2022, 58(7): 266-275.

参考文献

[1] NILSSON J，BRANNSTROM M，COELINGH E，et al.Lane change maneuvers for automated vehicles[J].IEEE Transactions on Intelligent Transportation Systems，2017，18（5）：1087-1096.
[2] 杨柳，黄中祥，况爱武.换道规则对高速公路双车道交通流的影响[J].中南大学学报（自然科学版），2016，47（5）：1752-1759.
YANG L，HUANG Z X，Kuang A W.Influence of lane-changing rules on two-lane traffic flow of freeway[J].Journal of Central South University（Science and Technology），2016，47（5）：1752-1759.
[3] GIPPS P G.A model for the structure of lane-changing decisions[J].Transportation Research Part B（Methodological），1986，20（5）：403-414.
[4] RASEKHIPOUR Y，KHAJEPOUR A，CHEN S，et al.A potential field-based model predictive path-planning controller for autonomous road vehicles[J].IEEE Transactions on Intelligent Transportation Systems，2017，18（5）：1255-1267.
[5] JI J，KHAJEPOUR A，MELEK W W，et al.Path planning and tracking for vehicle collision avoidance based on model predictive control with multiconstraints[J].IEEE Transactions on Vehicular Technology，2017，66（2）：952-964.
[6] PEREZ J，MILANES V，ONIEVA E，et al.Longitudinal fuzzy control for autonomous overtaking[C]//Proceedings of International Conference on Mechatronics，2011：188-193.
[7] VALLON C，ERCAN Z，CARVALHO A，et al.A machine learning approach for personalized autonomous lane change initiation and control[C]//Proceedings of IEEE Intelligent Vehicles Symposium，2017：1590-1595.
[8] LIU Y，WANG X，LI L，et al.A novel lane change decision-making model of autonomous vehicle based on support vector machine[J].IEEE Access，2019，7：26543-26550.
[9] SAKR A H，BANSAL G，VLADIMEROU V，et al.Lane change detection using V2V safety messages[C]//Proceedings of International Conference on Intelligent Transportation Systems，2018：3967-3973.
[10] LI T，WU J，CHAN C，et al.Evolutionary learning in decision making for tactical lane changing[C]//Proceedings of International Conference on Intelligent Transportation Systems，2019：1826-1831.
[11] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Playing atari with deep reinforcement learning[J].arXiv：1312.5602，2013.
[12] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Human-level control through deep reinforcement learning[J].Nature，2015，518：529-533.
[13] SILVER D，SCHRITTWIESER J，SIMONYAN K，et al.Mastering the game of Go without human knowledge[J].Nature，2017，550：354-359.
[14] 董瑶，葛莹莹，郭鸿湧，等.基于深度强化学习的移动机器人路径规划[J].计算机工程与应用，2019，55（13）：15-19.
DONG Y，GE Y Y，GUO H Y，et al.Path planning for mobile robot based on deep reinforcement learning[J].Computer Engineering and Applications，2019，55（13）：15-19.
[15] LI D，ZHAO D，ZHANG Q，et al.Reinforcement learning and deep learning based lateral control for autonomous driving[application notes][J].IEEE Computational Intelligence Magazine，2019，14（2）：83-98.
[16] MIRCHEVSKA B，BLUM M，LOUIS L，et al.Reinforcement learning for autonomous maneuvering in highway scenarios[C]//Workshop for Driving Assistance Systems and Autonomous Driving，2017：32-41.
[17] HOEL C，WOLFF K，LAINE L，et al.Automated speed and lane change decision making using deep reinforcement learning[C]//Proceedings of International Conference on Intelligent Transportation Systems，2018：2148-2155.
[18] WOLF P，HUBSCHNEIDER C，WEBER M，et al.Learning how to drive in a real world simulation with deep Q-networks[C]//Proceedings of IEEE Intelligent Vehicles Symposium，2017：244-250.
[19] MIRCHEVSKA B，PEK C，WERLING M，et al.High-level decision making for safe and reasonable autonomous lane changing using reinforcement learning[C]//Proceedings of International Conference on Intelligent Transportation Systems，2018：2156-2162.
[20] VAN HASSELT H，GUEZ A，SILVER D，et al.Deep reinforcement learning with double Q-learning[C]//Proceedings of International Conference on Artificial Intelligence，2016：2094-2100.
[21] WANG Z，SCHAUL T，HESSEL M，et al.Dueling network architectures for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning，2016：1995-2003.
[22] SCHAUL T，QUAN J，ANTONOGLOU I，et al.Prioritized experience replay[J].arXiv：1511.05952，2015.