Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (23): 349-356.DOI: 10.3778/j.issn.1002-8331.2308-0435

• Engineering and Applications • Previous Articles    

Multi-Agent Reinforcement Learning for On-Ramp Merging Control

LI Chun, WU Zhizhou, XU Hongxin, LIANG Yunyi   

  1. 1.College of Intelligent Manufacturing Modern Industry, Xinjiang University, Urumqi 830017, China
    2.College of Transportation and Communications, Xinjiang University, Urumqi 830017, China
    3.Xinjiang Key Laboratory for Green Construction and Smart Traffic Control of Transportation Infrastructure, Xinjiang University, Urumqi 830017, China
    4.School of Engineering & Design, Technical University of Munich, Munich 80333, Germany
  • Online:2024-12-01 Published:2024-11-29

基于多智能体强化学习自动合流控制方法研究

李春,吴志周,许宏鑫,梁韵逸   

  1. 1.新疆大学 智能制造现代产业学院,乌鲁木齐 830017
    2.新疆大学 交通运输工程学院,乌鲁木齐 830017
    3.新疆大学 新疆交通基础设施绿色建养与智慧交通管控重点实验室,乌鲁木齐 830017
    4.慕尼黑工业大学 工程与设计学院,德国 慕尼黑 80333

Abstract: On-ramp merging is a challenging task for CAV, considering that mixed traffic with CAV and manually driven vehicles HDVs will exist in more traffic scenarios in the coming time. Based on the traffic characteristics of the merging area, the multi-vehicle cooperative convergence is represented as a Markov decision process, and a reward function that considers both vehicle safety and efficiency is established. The CTDE-MARL algorithm framework is proposed based on the distributed MARL framework, which saves computational resources on a single intelligence. Two control algorithms, A2C and PPO, are established based on the two frameworks. Finally, the results of simulation experiments show that the overall performance of the established improved algorithm is better than the original algorithm, which improves the average speed of vehicles and satisfies the minimum headway while reducing the collision rate and the waiting time for merging. It satisfies the safety and improves the efficiency of vehicles in the merging area.

Key words: reinforcement learning, merging area, automatic driving, multi-agents

摘要: 对于智能网联车辆(connected automated vehicle,CAV)来说,上匝道合并是一项具备挑战性的任务,考虑到未来一段时间内有CAV和人工驾驶车辆(human-drive vehicle,HAV)的混合交通将存在于更多的交通场景中。根据合流区交通特性,将多车协同汇入表示为马尔可夫决策过程,建立了同时考虑车辆安全和效率的奖励函数。基于分布式多智能体强化学习(muti-agent reinforcement learning,MARL)框架提出中心式训练分散式执行的改进框架(centralized training and decentralized execution,CTDE)的MARL算法框架,节省了单智能体上的计算资源。建立基于两种框架的优势动作评论家(advantage actor critic,A2C)和近端策略优化(proximal policy optimization,PPO)两种控制算法。仿真实验结果表明,所建立的改进算法的整体性能优于原算法,提升了车辆平均行驶速度,满足最小车头时距同时降低了碰撞率和汇入等待时长,满足了合流区车辆的通行安全和提高了通行效率。

关键词: 强化学习, 合流区, 自动驾驶, 多智能体