多智能体序列决策的多交叉口交通信号协同控制方法

doi:10.3778/j.issn.1002-8331.2405-0153

摘要/Abstract

摘要： 深度强化学习可以利用大序列模型自身的优势，来解决多交叉口交通信号协同控制问题，为此，提出了多智能体序列决策的多交叉口交通信号协同控制方法。根据多智能体优势分解定理，利用序列模型的特性将多交叉口交通信号控制建模为序列问题，将实时的多交叉口交通信号控制转变成一个多智能体序列决策问题，充分利用了多智能体强化学习决策过程与序列模型预测之间惊人的联系。使用小样本Transformer序列模型来在线学习每个智能体的最优控制策略，实现多交叉口交通信号协同控制，解决了集中训练分散执行的训练模式很难覆盖多智能体交互的全部复杂性，随着智能体数量不断增多，导致最优联合值函数求解更复杂等问题。实验结果表明，所提出的方法可以显著提高交通信号控制算法的性能并降低其实现的复杂性。

关键词: 多智能体优势分解, 序列决策, 多交叉口, 协同控制, 强化学习

Abstract: Deep reinforcement learning can use the advantages of large sequence models to solve the problem of multi-intersection traffic signal cooperative control, and a multi-agent sequential decision-making method for coordinated control of multi-intersection traffic signals is proposed. Firstly, according to the multi-agent dominance decomposition theorem, the multi-intersection traffic signal control is modeled as a sequence problem by using the characteristics of the sequence models, and the real-time multi-intersection traffic signal control is transformed into a multi-agent sequence decision-making problem, which makes full use of the amazing relationship between the multi-agent reinforcement learning decision-making process and the sequence model prediction. Then, the small-sample Transformer sequence model is used to learn the optimal control strategy of each agent online to realize the cooperative control of traffic signals at multiple intersections, which solves the problem that it is difficult to cover all the complexity of multi-agent interaction in the training mode of centralized training and decentralized execution, and the optimal joint value function is more complex to solve with the increasing number of agents. The experimental results show that the proposed method can significantly improve the performance of the traffic signal control algorithm and reduce the complexity of its implementation.

Key words: multi-agent dominance decomposition, sequential decision making, multiple intersections, cooperative control, reinforcement learning

王智文, 卢玉梅, 张海鹏, 庞煜丽. 多智能体序列决策的多交叉口交通信号协同控制方法[J]. 计算机工程与应用, 2025, 61(17): 344-354.

WANG Zhiwen, LU Yumei, ZHANG Haipeng, PANG Yuli. Multi-Intersection Traffic Signal Cooperative Control Method Based on Multi-Agent Sequential Decision Making[J]. Computer Engineering and Applications, 2025, 61(17): 344-354.

参考文献

[1] MGUNI D, JAFFERJEE T, CHEN H, et al. MANSA: learning fast and slow in multi-agent systems[C]//Proceedings of the 40th International Conference on Machine Learning, 2023: 24631-24658.
[2] CHU T S, WANG J, CODECA L, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(3): 1086-1095.
[3] HU S Y, ZHU F D, CHANG X J, et al. UPDeT: universal multi-agent reinforcement learning via policy decoupling with transformers[J]. arXiv:2101.08001, 2021.
[4] OROOJLOOY A, NAZARI M, HAJINEZHAD D, et al. AttendLight: universal attention-based reinforcement learning model for traffic signal control[J]. arXiv:2010.05772, 2020.
[5] ZANG X S, YAO H X, ZHENG G J, et al. MetaLight: value-based meta-reinforcement learning for traffic signal control[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 1153-1160.
[6] ZHANG H C, LIU C, ZHANG W N, et al. GeneraLight: impr-oving environment generalization of traffic signal control via meta reinforcement learning[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York: ACM, 2020: 1783-1792.
[7] WANG M, WU L B, LI M, et al. Meta-learning based spatial-temporal graph attention network for traffic signal control[J]. Knowledge-Based Systems, 2022, 250: 109166.
[8] WEI H, ZHENG G J, YAO H X, et al. IntelliLight: a reinforcement learning approach for intelligent traffic light control[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2018: 2496-2505.
[9] HAYDARI A, YILMAZ Y. Deep reinforcement learning for intelligent transportation systems: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(1): 11-32.
[10] RASHEED F, YAU K A, NOOR R M, et al. Deep reinforcement learning for traffic signal control: a review[J]. IEEE Access, 2020, 8: 208016-208044.
[11] XIONG Y H, ZHENG G J, XU K, et al. Learning traffic signal control from demonstrations[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 2019: 2289-2292.
[12] CHEN C C, WEI H, XU N, et al. Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 3414-3421.
[13] WEI H, XU N, ZHANG H, et al. CoLight: learning network-level cooperation for traffic signal control[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019: 1913-1922.
[14] WEI H, CHEN C C, ZHENG G J, et al. PressLight: learning max pressure control to coordinate traffic signals in arterial network[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2019: 1290-1298.
[15] XU B Y, WANG Y W, WANG Z Z, et al. Hierarchically and cooperatively learning traffic signal control[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 669-677.
[16] WU L B, WANG M, WU D, et al. DynSTGAT: dynamic spatial-temporal graph attention network for traffic signal control[J]. arXiv:2109.05491, 2021.
[17] ZENG Z. GraphLight: graph-based reinforcement learning for traffic signal control[C]//Proceedings of the IEEE 6th International Conference on Computer and Communication Systems. Piscataway: IEEE, 2021: 645-650.
[18] DEVAILLY F X, LAROCQUE D, CHARLIN L. IG-RL: inductive graph reinforcement learning for massive-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 7496-7507.
[19] ZHAO W P, YE Y T, DING J P, et al. IPDALight: intensity- and phase duration-aware traffic signal control based on Reinforcement Learning[J]. Journal of Systems Architecture, 2022, 123: 102374.
[20] WANG Y N, XU T, NIU X, et al. STMARL: a spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control[J]. IEEE Transactions on Mobile Computing, 2022, 21(6): 2228-2242.
[21] SU H R, ZHONG Y D, DEY B, et al. EMVLight: a decentralized reinforcement learning framework for efficient passage of emergency vehicles[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022 : 4593-4601.
[22] ZHENG G J, XIONG Y H, ZANG X S, et al. Learning phase competition for traffic signal control[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 2019: 1963-1972.
[23] WEI H, ZHENG G, GAYAH V, et al. A survey on traffic signal control methods[J]. arXiv:1904.08117, 2019.
[24] KUBA G, WEN M, MENG L, et al. Settling the variance of multi-agent policy gradients[C]//Advances in Neural Information Processing Systems, 2021: 13458-13470.
[25] ZHENG W Q, GUO Q Q, YANG H, et al. Delayed propagation transformer: a universal computation engine towards practical control in cyber-physical systems[J]. arXiv:2110. 15926, 2021.
[26] TANG Y J, HA D R. The sensory neuron as a transformer: permutation-invariant neural networks for reinforcement lear-ning[J]. arXiv:2109.02869, 2021.
[27] CHEN L L, LU K, RAJESWARAN A, et al. Decision Transformer: reinforcement learning via sequence modeling[J]. arXiv:2106.01345, 2021.
[28] ALCORN M A, NGUYEN A. baller2vec++: a look-ahead multi-entity transformer for modeling coordinated agents[J]. arXiv:2104.11980, 2021.
[29] ZHENG Q, ZHANG A, GROVER A. Online decision transformer[J]. arXiv:2202.05607, 2022.
[30] WEN M N, KUBA J, LIN R J, et al. Multi-agent reinforcement learning is a sequence modeling problem[J]. arXiv:2205. 14953, 2022.
[31] ZHANG H C, FENG S Y, LIU C, et al. CityFlow: a multi-agent reinforcement learning environment for large scale city traffic scenario[C]//Proceedings of the World Wide Web Conference. New York: ACM, 2019: 3620-3624.
[32] MEI H, LEI X, DA L, et al. LibSignal: an open library for traffic signal control[J]. arXiv:2211.10649, 2022.
[33] WEI H, ZHENG G J, GAYAH V, et al. Recent advances in reinforcement learning for traffic signal control[J]. ACM SIGKDD Explorations Newsletter, 2021, 22(2): 12-18.
[34] LIANG E M, SU Z C, FANG C L, et al. OAM: an option-action reinforcement learning framework for universal multi-intersection control[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 4550-4558.
[35] ZHENG G, ZANG X, XU N, et al. Diagnosing reinforcement learning for traffic signal control[J]. arXiv:1905.04716, 2019.