Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (1): 41-55.DOI: 10.3778/j.issn.1002-8331.2108-0467

• Research Hotspots and Reviews • Previous Articles     Next Articles

Survey on Vehicle Reinforcement Learning in Routing Problem

NIU Pengfei, WANG Xiaofeng, LU Lei, ZHANG Jiulong   

  1. 1.College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
    2.The Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China
  • Online:2022-01-01 Published:2022-01-06



  1. 1.北方民族大学 计算机科学与工程学院,银川 750021
    2.北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021

Abstract: Vehicle routing problem is the key technologies in the field of logistics research. Its purpose is to get a lowest cost vehicle routing plan while meeting the customer’s needs. However, with the increasing of problem size in logistics transportation, the real-time requirement of solving vehicle routing problem is increasing, and the traditional algorithm cannot realize the requirements of the industry gradually. For decades, a number of new methods use reinforcement learning and deep reinforcement learning to solve vehicle routing problem. Base on simple analysis of conventional methods for solving vehicle routing problem, this review summaries the current algorithms for solving vehicle routing problem based on reinforcement learning. Reinforcement learning algorithms are divided into dynamic programming, value-based and policy-based. This paper summarizes the theoretical foundation and studying status. Finally, the future development direction of vehicle routing problem based on reinforcement learning and deep reinforcement learning is prospected.

Key words: vehicle routing problem, Markov decision process, reinforcement learning, deep reinforcement learning

摘要: 车辆路径问题是物流运输优化中的核心问题,目的是在满足顾客需求下得到一条最低成本的车辆路径规划。但随着物流运输规模的不断增大,车辆路径问题求解难度增加,并且对实时性要求也不断提高,已有的常规算法不再适应实际要求。近年来,基于强化学习算法开始成为求解车辆路径问题的重要方法,在简要回顾常规方法求解车辆路径问题的基础上,重点总结基于强化学习求解车辆路径问题的算法,并将算法按照基于动态规划、基于价值、基于策略的方式进行了分类;最后对该问题未来的研究进行了展望。

关键词: 车辆路径问题, 马尔科夫决策过程, 强化学习, 深度强化学习