计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (7): 294-301.DOI: 10.3778/j.issn.1002-8331.2111-0490

• 工程与应用 • 上一篇    下一篇

基于局部位置感知的多智能体网约车调度方法

黄晓辉,凌嘉壕,张雄,熊李艳,曾辉   

  1. 华东交通大学 信息工程学院,南昌 330013
  • 出版日期:2023-04-01 发布日期:2023-04-01

Online Car-Hailing Dispatch Method Based on Local Position Perception Multi-Agent

HUANG Xiaohui, LING Jiahao, ZHANG Xiong, XIONG Liyan, ZENG Hui   

  1. School of Information Engineering, East China Jiaotong University, Nanchang 330013, China
  • Online:2023-04-01 Published:2023-04-01

摘要: 近年来,网上约车成为人们日常出行不可或缺的一部分。网约车平台的核心任务是如何有效地把订单派送给合适的司机,使得用户总体等待时间尽可能短,而司机的收益尽可能高。在目前的研究中,主要采用贪心算法以及强化学习来构建模型。但当前方法大都只考虑乘客的即时满意度,未能有效地考虑车辆、订单之间相对位置关系,从长远的角度来降低全体乘客的等待时间。为此,将订单派送构建为一个马尔可夫过程,提出了一种基于局部位置感知的多智能体的车辆调度方法。该方法通过设计合适的输入状态和卷积神经网络来捕捉人与车的时空关系,从长远角度来降低乘客的总体等待时间。实验结果表明,在不同规格的地图、不同数量的车辆和订单的场景中,提出的方法均优于现有的研究方法,并且拥有更好的泛化能力。特别是在大规模人车环境的复杂场景中,该方法所取得的结果要明显优于现有方法。

关键词: 多智能体强化学习, 车辆调度, 局部感知, 深度强化学习

Abstract: In recent years, online car-hailing has become an indispensable part of people’s daily travel. The core task of the online car-hailing platform is how to effectively dispatch the order to the appropriate driver, so that the overall waiting time of users is as short as possible, and the driver’s revenue is as high as possible. In the current research, greedy algorithms and reinforcement learning are mainly used to build models. However, current methods mostly only consider the immediate satisfaction of passengers, and fail to effectively consider the relative position relationship between vehicles and orders, and reduce the waiting time of all passengers from a long-term perspective. For this reason, this paper constructs order dispatch as a Markov process, and proposes a multi-agent vehicle dispatch method based on local position perception. This method captures the space-time relationship between people and vehicles by designing appropriate input states and convolutional neural networks, and reduces the overall waiting time of passengers from a long-term perspective. Experimental results show that in scenarios with different specifications of maps, different numbers of vehicles and orders, the method proposed is superior to existing methods and has better generalization capabilities. Especially in large-scale human-vehicle environments. the results obtained by the method are significantly better than the existing methods.

Key words: multi-agent reinforcement learning, vehicle scheduling, local perception, deep reinforcement learning