Survey on Vehicle Reinforcement Learning in Routing Problem

doi:10.3778/j.issn.1002-8331.2108-0467

Abstract

Abstract: Vehicle routing problem is the key technologies in the field of logistics research. Its purpose is to get a lowest cost vehicle routing plan while meeting the customer’s needs. However, with the increasing of problem size in logistics transportation, the real-time requirement of solving vehicle routing problem is increasing, and the traditional algorithm cannot realize the requirements of the industry gradually. For decades, a number of new methods use reinforcement learning and deep reinforcement learning to solve vehicle routing problem. Base on simple analysis of conventional methods for solving vehicle routing problem, this review summaries the current algorithms for solving vehicle routing problem based on reinforcement learning. Reinforcement learning algorithms are divided into dynamic programming, value-based and policy-based. This paper summarizes the theoretical foundation and studying status. Finally, the future development direction of vehicle routing problem based on reinforcement learning and deep reinforcement learning is prospected.

Key words: vehicle routing problem, Markov decision process, reinforcement learning, deep reinforcement learning

摘要： 车辆路径问题是物流运输优化中的核心问题，目的是在满足顾客需求下得到一条最低成本的车辆路径规划。但随着物流运输规模的不断增大，车辆路径问题求解难度增加，并且对实时性要求也不断提高，已有的常规算法不再适应实际要求。近年来，基于强化学习算法开始成为求解车辆路径问题的重要方法，在简要回顾常规方法求解车辆路径问题的基础上，重点总结基于强化学习求解车辆路径问题的算法，并将算法按照基于动态规划、基于价值、基于策略的方式进行了分类；最后对该问题未来的研究进行了展望。

关键词: 车辆路径问题, 马尔科夫决策过程, 强化学习, 深度强化学习

NIU Pengfei, WANG Xiaofeng, LU Lei, ZHANG Jiulong. Survey on Vehicle Reinforcement Learning in Routing Problem[J]. Computer Engineering and Applications, 2022, 58(1): 41-55.

牛鹏飞, 王晓峰, 芦磊, 张九龙. 强化学习在车辆路径问题中的研究综述[J]. 计算机工程与应用, 2022, 58(1): 41-55.

References

[1] DANTZIG G B，RAMSER J H.The truck dispatching problem[J].Management Science，1959，6（1）：80-91.
[2] LENSTRA J K，KAN A H G R.Complexity of vehicle routing and scheduling problems[J].Networks，1981，11（2）：221-227.
[3] CLARKE G，WRIGHT J W.Scheduling of vehicles from a central depot to a number of delivery points[J].Operations Research，1964，12（4）：568-581.
[4] JUNGER M，REINELT G，RINALDI G.The traveling salesman problem[J].Handbooks in Operations Research and Management Science，1995，7：225-330.
[5] VIDAL T，CRAINIC T G，GENDREAU M，et al.Heuristics for multi-attribute vehicle routing problems：A survey and synthesis[J].European Journal of Operational Research，2013，231（1）：1-21.
[6] ULMER M W，THOMAS B W.Same-day delivery with heterogeneous fleets of drones and vehicles[J].Networks，2018，72（4）：475-505.
[7] MANDZIUK J.New shades of the vehicle routing problem：Emerging problem formulations and computational intelligence solution methods[J].IEEE Transactions on Emerging Topics in Computational Intelligence，2018，3（3）：230-244.
[8] CAMM J D，MAGAZINE M J，KUPPUSAMY S，et al.The demand weighted vehicle routing problem[J].European Journal of Operational Research，2017，262（1）：151-162.
[9] LU D，GZARA F.The robust vehicle routing problem with time windows：Solution by branch and price and cut[J].European Journal of Operational Research，2019，275（3）：925-938.
[10] LI H，CHANG X，ZHAO W，et al.The vehicle flow formu lation and savings-based algorithm for the rollon-rolloff vehicle routing problem[J].European Journal of Operational Research，2017，257（3）：859-869.
[11] MOONS S，RAMAEKERS K，CARIS A，et al.Integrating production scheduling and vehicle routing decisions at the operational decision level：A review and discussion[J].Computers & Industrial Engineering，2017，104：224-245.
[12] KONG Y，TANG J F，DONG G，et al.An insertion algorithm for vehicle scheduling in picking up and delivering customers to airport[J].Control Theory & Applications，2009，26（1）：92-96.
[13] 穆东，王超，王胜春，等.基于并行模拟退火算法求解时间依赖型车辆路径问题[J].计算机集成制造系统，2015，21（6）：1626-1636.
MU D，WANG C，WANG S C，et al.Solving time-dependent vehicle routing problem based on parallel simulated annealing algorithm[J].Computer Integrated Manufacturing System，2015，21（6）：1626-1636.
[14] WANG Y，WU Q，GLOVER F.Effective metaheuristic algorithms for the minimum differential dispersion problem[J].European Journal of Operational Research，2017，258（3）：829-843.
[15] PINTEA C M，CHIRA C，DUMITRESCU D，et al.Senstive ants in solving the generalized vehicle routing problem[J].International Journal of Computers Communications & Control，2011，6（4）：734-741.
[16] FERNANDEZ-V ARGAS J A，BO NILLA-PETRICIOLET A.Development of a global optimization algorithm in ant colonies with feasible region selection for continuous search spaces[J].Revista Internacional de Metodos Numericos Para Calculoy Diseno en Ingenieria，2014，30（3）：178-187.
[17] JIA Y H，CHEN W N，GU T L，et al.A dynamic logistic dispatching system with set-based particle swarm optimization[J].IEEE Transactions on Systems Man Cybernetics-Systems，2018，48（9）：1607-1621.
[18] SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine Learning，1988，3（1）：9-44.
[19] WATKINS C J C H，DAYAN P.Q-learning[J].Machine Learning，1992，8（3/4）：279-292.
[20] MNIH V，KAVUKCUOGLU K，Silver D，et al.Playing ATARI with deep reinforcement learning[J].arXiv：1312. 5602，2013.
[21] WILLIAMS R J.Simple statistical gradient-following algorithms for connectionist reinforcement learning[J].Machine Learning，1992，8（3）：229-256.
[22] PETERS J，SCHAAL S.Natural actor-critic[J].Neurocom- puting，2008，71（7/9）：1180-1190.
[23] HA D，SCHMIDHUBER J.World models[J].arXiv：1803. 10122，2018.
[24] SECOMANDI N.Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands[J].Computers & Operations Research，2000，27（11/12）：1201-1225.
[25] TATARAKIS A，MINIS I.Stochastic single vehicle routing with a predefined customer sequence and multiple depot returns[J].European Journal of Operational Research，2009，197（2）：557-571.
[26] POWELL W B，SIMAO H P，BOUZAIENE-AYARI B.Approximate dynamic programming in transportation and logistics：A unified framework[J].EURO Journal on Transportation and Logistics，2012，1（3）：237-284.
[27] ?IMEN M，SOYSAL M.Time-dependent green vehicle routing problem with stochastic vehicle speeds：An approximate dynamic programming algorithm[J].Transportation Research Part D（Transport and Environment），2017，54：82-98.
[28] ULMER M W，MATTFELD D C，K?ster F.Budgeting time for dynamic vehicle routing with stochastic customer requests[J].Transportation Science，2018，52（1）：20-37.
[29] ULMER M W，GOODSON J C，MATTFELD D C，et al.Offline?online approximate dynamic programming for dynamic vehicle routing with stochastic requests[J].Transportation Science，2019，53（1）：185-202.
[30] KOK A L，HANS E W，SCHUTTEN J M J.Vehicle routing under time-dependent travel times：the impact of congestion avoidance[J].Computers & Operations Research，2012，39（5）：910-918.
[31] SECOMANDI N，MARGOT F.Reoptimization approaches for the vehicle-routing problem with stochastic demands[J].Operations Research，2009，57（1）：214-230.
[32] GOODSON J C，OHLMANN J W，THOMAS B W.Rollout policies for dynamic solutions to the multivehicle routing problem with stochastic demand and duration limits[J].Operations Research，2013，61（1）：138-154.
[33] KOOL W，VAN HOOF H，GROMICHO J，et al.Deep policy dynamic programming for vehicle routing problems[J].arXiv：2102.11756，2021.
[34] ZHANG C，DELLAERT N P，ZHAO L，et al.Single vehicle routing with stochastic demands：approximate dynamic programming[R].Eindhoven：Technische Universiteit Eindhoven，2013.
[35] JOE W，LAU H C.Deep reinforcement learning approach to solve dynamic vehicle routing problem with stochastic customers[C]//Proceedings of the International Conference on Automated Planning and Scheduling，2020：394-402.
[36] BOUHAMED O，GHAZZAI H，BESBES H，et al.Q-learning based routing scheduling for a multi-task autonomous agent[C]//Proceedings of IEEE 62nd International Midwest Symposium on Circuits and Systems（MWSCAS），2019：634-637.
[37] BDEIR A，BOEDER S，DERNEDDE T，et al.RP-DQN：An application of Q-learning to vehicle routing problems[J].arXiv：2104.12226，2021.
[38] KOOL W，VAN HOOF H，WELLING M.Attention，learn to solve routing problems![J].arXiv：1803.08475，2018.
[39] CHEN X，ULMER M W，THOMAS B W.Deep Q-learning for same-day delivery with vehicles and drones[J/OL].European Journal of Operational Research（2021?06?17）[2021?08?01].https：//www.sciencedirect.com/science/article/abs/pii/S0377221721005361.
[40] ULMER M W，GOODSON J C，MATTFELD D C，et al.On modeling stochastic dynamic vehicle routing problems[J].EURO Journal on Transportation and Logistics，2020，9（2）：100008.
[41] WANG Z，SCHAUL T，HESSEL M，et al.Dueling network architectures for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning.New York：Curran Associates，2016：1995-2003.
[42] ZHANG W，WANG Q，LI J，et al.Dynamic fleet manage ment with rewriting deep reinforcement learning[J].IEEE Access，2020，8：143333-143341.
[43] 刘建伟，高峰，罗雄麟.基于值函数和策略梯度的深度强化学习综述[J].计算机学报，2019，42（6）：1406-1438.
LIU J W，G F，L X L.Survey of deep reinforcement learning based on value function and policy gradient[J].Chinese Journal of Computers，2019，42（6）：1406-1438.
[44] VINYALS O，FORTUNATO M，JAITLY N.Pointer networks[J].Advances in Neural Information Processing Systems，2015，28：2692-2700.
[45] NAZARI M，OROOJLOOY A，TAKAC M，et al.Reinforce ment learning for solving the vehicle routing problem[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.New Work：Curran Associates，2018：9861-9871.
[46] XIN L，SONG W，CAO Z，et al.Step-wise deep learning models for solving routing problems[J].IEEE Transactions on Industrial Informatics，2020，17（7）：4861-4871.
[47] FALKNER J K，SCHMIDT-THIEME L.Learning to solve vehicle routing problems with time windows through joint attention[J].arXiv：2006.09100，2020.
[48] XIN L，SONG W，CAO Z，et al.Multi-decoder attention model with embedding glimpse for solving vehicle routing problems[C]//Proceedings of 35th AAAI Conference on Artificial Intelligence，2021：12042-12049.
[49] ZHANG K，HE F，ZHANG Z，et al.Multi-vehicle routing problems with soft time windows：A multi-agent reinforcement learning approach[J].Transportation Research Part C（Emerging Technologies），2020，121：102861.
[50] XU Y，FANG M，CHEN L，et al.Reinforcement learning with multiple relational attention for solving vehicle routing problems[J/OL].IEEE Transactions on Cybernetics（2021-07-08）[2021-08-01].https：//ieeexplore.ieee.org/abstract/document/9478307.
[51] PENG B，WANG J，ZHANG Z.A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems[C]//Proceedings of International Symposium on Intelligence Computation and Applications，2019：636-650.
[52] LU H，ZHANG X，YANG S.A learning-based iterative method for solving vehicle routing problems[C]//Proceedings of International Conference on Learning Representations，2021.
[53] HOTTUNG A，TIERNEY K.Neural large neighborhood search for the capacitated vehicle routing problem[J].arXiv：1911.09539，2019.
[54] CHEN X，TIAN Y.Learning to perform local rewriting for combinatorial optimization[J].arXiv：1810.00337，2018.
[55] ZHAO J，MAO M，ZHAO X，et al.A hybrid of deep reinforcement learning and local search for the vehicle routing problems[J].IEEE Transactions on Intelligent Transportation Systems，2020，61（1）：7208-7218.
[56] LI J，XIN L，CAO Z，et al.Heterogeneous attentions for solving pickup and delivery problem via deep reinforcement learning[J/OL].IEEE Transactions on Intelligent Transportation Systems（2021-02-10）[2021-08-01].https：//ieeexplore.ieee.org/abstract/document/9352489.
[57] GAO L，CHEN M，CHEN Q，et al.Learn to design the heuristics for vehicle routing problem[J].arXiv：2002. 08539，2020.
[58] VERA J M，ABAD A G.Deep reinforcement learning for routing a heterogeneous fleet of vehicles[C]//Proceedings of 2019 IEEE Latin American Conference on Computational Intelligence（LA-CCI），2019：1-6.