计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (19): 282-291.DOI: 10.3778/j.issn.1002-8331.2406-0105

• 工程与应用 • 上一篇    下一篇

基于循环图注意力强化学习的交叉口多车协同控制方法

杨伟达,吴志周,梁韵逸   

  1. 1.同济大学 道路与交通工程教育部重点实验室,上海 201804
    2.上海理工大学 管理学院,上海 200093
  • 出版日期:2025-10-01 发布日期:2025-09-30

Multi-Vehicle Cooperative Control at Intersection:Recurrent Graph Attention Reinforcement Learning

YANG Weida, WU Zhizhou, LIANG Yunyi   

  1. 1.The Key Laboratory of Road and Traffic Engineering, Ministry of Education, Tongji University, Shanghai 201804, China
    2.Business School, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Online:2025-10-01 Published:2025-09-30

摘要: 在未来一段时间内,联网自动驾驶汽车(connected automated vehicle,CAV)和人工驾驶车辆(human-drive vehicle,HV)将会在无信号交叉口混合行驶。由于CAV观测范围存在限制,仅能根据其邻域内的混合交通流信息做出决策,将无信号交叉口的多车协同控制过程建模为分布式可观测马尔可夫决策过程,并建立基于柔性演员-评论家(soft actor-critic,SAC)的集中式训练分布式执行(centralized training and decentralized execution,CTDE)框架。将车辆之间的关系建模成图,以多层注意力层作为演员网络和评论家网络的卷积核,推断CAV邻域内车辆的图特征。构建门控循环单元(gated recurrent unit,GRU)加强邻域动态图特征的长期记忆,避免车辆移动过程中邻域信息高度变化导致的信息遗忘。仿真结果表明,在无碰撞的前提下,较现有最优的无信号交叉口分布式控制算法,所提算法在单向交叉口、可转向交叉口、2×2交叉口网络场景下的平均车速分别提升了10.51%,4.64%,10.24%。

关键词: 无信号交叉口, 车辆控制, 多智能体强化学习, 自动驾驶, 局部可观测

Abstract: There will be mixed traffic consisting of connected automated vehicles (CAVs) and human-drive vehicles (HVs) at future unsignalized intersections for long periods of time. CAV can only make decisions based on neighborhood information of mixed traffic due to CAV’s partial observation. The process of multi-vehicle cooperative control at unsignalized intersections is modeled as a decentralized partially observable Markov decision process. This study proposes a centralized training and decentralized execution (CTDE) framework based on soft actor-critic (SAC). The relationship between vehicles is modeled into a graph, and multiple attention layers are used as the convolution kernel of actor network and critic network to infer the neighbor graph features of vehicles. Gated recurrent unit (GRU) is developed to keep the long-term memory of neighbor dynamic graph features and avoid information for-getting caused by the changes of neighborhood information during vehicle movement. Simulation results show that the collision rate of this algorithm is 0. The average vehicle speed of one-way 1×1 intersection, multi-way 1×1 intersection, and multi-way 2×2 intersection network is increased by 10.51%, 4.64%, and 10.24% compared with the state-of-the-art multi-vehicles cooperative control algorithms, respectively.

Key words: unsignalized intersection, vehicle control, multi-agent reinforcement learning, autonomous driving, partially observable