基于多智能体增强学习的公交驻站控制方法

计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (17): 8-13.

基于多智能体增强学习的公交驻站控制方法

陈春晓1，陈治亚1，2，陈维亚1

1.中南大学交通运输工程学院，长沙 410075
2.西安电子科技大学，西安 710071

出版日期:2015-09-01 发布日期:2015-09-14

Bus holding control method in public transit systems with multi-agent reinforcement learning

CHEN Chunxiao1, CHEN Zhiya1，2, CHEN Weiya1

1.School of Traffic and Transportation Engineering, Central South University, Changsha 410075, China
2.Xidian University, Xi’an 710071, China

Online:2015-09-01 Published:2015-09-14

摘要/Abstract

摘要： 车辆驻站是减少串车现象和改善公交服务可靠性的常用且有效控制策略，其执行过程需要在随机交互的系统环境中进行动态决策。考虑实时公交运营信息的可获得性，研究智能体完全合作环境下公交车辆驻站增强学习控制问题，建立基于多智能体系统的单线公交控制概念模型，描述学习框架下包括智能体状态、动作集、收益函数、协调机制等主要元素，采用hysteretic Q-learning算法求解问题。仿真实验结果表明该方法能有效防止串车现象并保持单线公交服务系统车头时距的均衡性。

关键词: 驻站, 多智能体增强学习, 多智能体系统, 控制策略

Abstract: Vehicle holding is a commonly used strategy among a variety of control strategies in transit operation for improving transit service reliability, whose implementation needs dynamic decision-making in an interactive and stochastic system environment. This paper introduces a novel use of a reinforcement learning framework to obtain vehicle holding autonomous control strategy in cooperative multi-agent system. Transit operation control model is developed based on multi-agent system. In the multi-agent reinforcement learning framework, each bus is modeled as an independent agent with learning abilities, for which the state, actions and reward are defined and a coordination mechanism for multiple bus agents is designed to obtain a joint holding actions. The hysteretic Q-learning algorithm is used to solve this holding problem. From the simulation experiments, the results illustrate that the proposed approach is able to prevent buses from bunching and regulate bus headway.

Key words: bus holding, multi-agent reinforcement learning, multi-agent system, control strategy

陈春晓1，陈治亚1，2，陈维亚1. 基于多智能体增强学习的公交驻站控制方法[J]. 计算机工程与应用, 2015, 51(17): 8-13.

CHEN Chunxiao1, CHEN Zhiya1，2, CHEN Weiya1. Bus holding control method in public transit systems with multi-agent reinforcement learning[J]. Computer Engineering and Applications, 2015, 51(17): 8-13.

[1]	陈世明，林子朋，高彦丽，裴惠琴. 自适应耦合权重下的异质群体一致性研究[J]. 计算机工程与应用, 2021, 57(4): 231-235.
[2]	李振涛，冯元珍，王正新. 事件触发下多智能体系统固定时间二分一致性[J]. 计算机工程与应用, 2021, 57(21): 80-86.
[3]	孙彧，曹雷，陈希亮，徐志雄，赖俊. 多智能体深度强化学习研究综述[J]. 计算机工程与应用, 2020, 56(5): 13-24.
[4]	陈良康，过榴晓，杨永清. 带有智能领导者的网络系统分群投影一致性[J]. 计算机工程与应用, 2020, 56(19): 42-47.
[5]	王梦娇，尹翔，黄宁馨. 基于迁移学习的多任务分配算法[J]. 计算机工程与应用, 2020, 56(13): 150-155.
[6]	冯元珍，刘敏. 具有时滞的混合阶多智能体系统的组一致性[J]. 计算机工程与应用, 2019, 55(12): 67-71.
[7]	李杨，徐峰，谢光强，黄向龙. 多智能体技术发展及其应用综述[J]. 计算机工程与应用, 2018, 54(9): 13-21.
[8]	单炳冉，陶凤鸣. 基于重要节点的复杂产品设计变更控制[J]. 计算机工程与应用, 2018, 54(6): 222-227.
[9]	李婕1，李昊2，赵新蕖1. 免疫遗传算法的混合动力汽车多目标优化[J]. 计算机工程与应用, 2018, 54(4): 237-243.
[10]	梁嘉琪，卜旭辉，刘建. 数据丢失下多智能体系统迭代学习跟踪控制[J]. 计算机工程与应用, 2018, 54(20): 42-47.
[11]	邱丽，过榴晓. 事件触发下随机非确定线性多智能体的指数同步[J]. 计算机工程与应用, 2018, 54(17): 141-145.
[12]	黄红伟1，黄天民2. 事件触发机制下的多智能体领导跟随一致性[J]. 计算机工程与应用, 2017, 53(6): 29-33.
[13]	李昆1，郑柏超1，2，钟露1. 不确定多智能体系统的鲁棒量化一致性研究[J]. 计算机工程与应用, 2017, 53(24): 48-54.
[14]	王世丽，金英花，吴晨. 带通信时滞的多智能体系统的群集运动[J]. 计算机工程与应用, 2017, 53(23): 24-28.
[15]	赵蕊，朱美玲，徐勇. 多智能体系统自适应跟踪控制[J]. 计算机工程与应用, 2017, 53(18): 39-43.