深度强化学习求解车辆路径问题的研究综述

doi:10.3778/j.issn.1002-8331.2210-0153

摘要/Abstract

摘要： 车辆路径问题（VRP）是组合优化问题中经典的NP难问题，广泛应用于交通、物流等领域，随着问题规模和动态因素的增多，传统算法很难快速、智能地求解复杂的VRP问题。近年来随着人工智能技术的发展，尤其是深度强化学习（DRL）在AlphaGo中的成功应用，为路径问题求解提供了全新思路。鉴于此，针对近年来利用DRL求解VRP及其变体问题的模型进行文献综述。回顾了DRL求解VRP的相关思路，并梳理基于DRL求解VRP问题的关键步骤，对基于指针网络、图神经网络、Transformer和混合模型的四类求解方法分类总结，同时对目前基于DRL求解VRP及其变体问题的模型性能进行对比分析，总结了基于DRL求解VRP问题时遇到的挑战以及未来的研究方向。

关键词: 车辆路径问题, 深度强化学习, 指针网络, 图神经网络, 混合模型

Abstract: Vehicle routing problem（VRP） is a classic NP-hard problem, which is widely used in transportation, logistics and other fields. With the scale of problem and dynamic factor increasing, the traditional method of solving the VRP is challenged in computational speed and intelligence. In recent years, with the rapid development of artificial intelligence technology, in particular, the successful application of reinforcement learning in AlphaGo provides a new idea for solving routing problems. In view of this, this paper mainly summarizes the recent literature using deep reinforcement learning to solve VRP and its variants. Firstly, it reviews the relevant principles of DRL to solve VRP and sort out the key steps of DRL-based to solve VRP. Then it systematically classifies and summarizes the pointer network, graph neural network, Transformer and hybrid models four types of solving methods, meanwhile this paper also compares and analyzes the current DRL-based model performance in solving VRP and its variants. Finally, this paper sums up the challenge of DRL-based to solve VRP and future research directions.

Key words: vehicle routing problem, deep reinforcement learning, pointer network, graph neural network, hybrid model

杨笑笑, 柯琳, 陈智斌. 深度强化学习求解车辆路径问题的研究综述[J]. 计算机工程与应用, 2023, 59(5): 1-13.

YANG Xiaoxiao, KE Lin, CHEN Zhibin. Review of Deep Reinforcement Learning Model Research on Vehicle Routing Problems[J]. Computer Engineering and Applications, 2023, 59(5): 1-13.

参考文献

[1] COOK W J，CUNNINGHAM W H，PULLEYBLANK W R，et al.Combinatorial optimization[M].[S.l.]：Wiley，2010：2-30.
[2] KARIMI-MAMAGHAN M，MOHAMMADI M，MEYER P，et al.Machine learning at the service of meta-heuristics for solving combinatorial optimization problems：a state-of-the-art[J].European Journal of Operational Research，2021，296（2）：393-422.
[3] 刘振宏，蔡茂诚.组合最优化算法和复杂性[M].北京：清华大学出版社，1988：1-19.
LIU Z H，CAI M C.Combinatorial optimization algorithms and complexity[M].Beijing：Tsinghua University Press，1988：1-19.
[4] LECUN Y，BENGIO Y，HINTON G.Deep learning[J].Nature，2015，521（7553）：436-444.
[5] SMIRNOV E A，TIMOSHENKO D M，RIANOV S N.Comparison of regularization methods for ImageNet classi-fication with deep convolutional neural networks[J].AASRI Procedia，2014，6：89-94.
[6] YOGATAMA D，BLUNSOM P，DYER C，et al.Learning to compose words into sentences with reinforcement learning[J].arXiv：1611.09100，2016.
[7] AKSHITA，SMITA.Recommender system：review[J].International Journal of Computer Applications，2013，71（24）：38-42.
[8] 刘全，翟建伟，章宗长，等.深度强化学习综述[J].计算机学报，2018，41（1）：1-27.
LIU Q，ZHAI J W，ZHANG Z Z，et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers，2018，41（1）：1-27.
[9] SILVER D，HUANG A，MADDISON C J，et al.Mastering the game of go with deep neural networks and tree search[J].Nature，2016，529（7587）：484-489.
[10] TANG Z，SHAO K，ZHAO D，et al.Recent progress of deep reinforcement learning：from AlphaGo to AlphaGo Zero[J].Control Theory & Applications，2017，34（12）：1529-1546.
[11] 李珺，段钰蓉，郝丽艳，等.混合优化算法求解同时送取货车辆路径问题[J].计算机科学与探索，2022，16（7）：1623-1632.
LI J，DUAN Y R，HAO L Y，et al.Hybrid optimization algorithm for vehicle routing problem with simultaneous delivery-pickup[J].Journal of Frontiers of Computer Science and Technology，2022，16（7）：1623-1632.
[12] VINALYS O，FORTUNATO M，JAITLY N.Pointer networks[C]//Advances in Neural Information Processing Systems，2015.
[13] BAHDANAU D，CHO K，BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv：1409.0473，2014.
[14] SCARSELLI F，GORI M，TSOI A C，et al.The graph neural network model[J].IEEE Transactions on Neural Networks，2008，20（1）：61-80.
[15] MA Q，GE S，HE D，et al.Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning[J].arXiv：1911.04936，2019.
[16] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems，Long Beach，California，Dec 4-9，2017.Red Hook：Curran Associates Inc，2017：6000-6010.
[17] KOOL W，VAN HOOF H，WELLING M.Attention，learn to solve routing problems[J].arXiv：1803.08475，2018.
[18] PUTERMAN M L.Markov decision processes[J].Handbooks in Operations Research and Management Science，1990，2：331-434.
[19] LI Y.Deep reinforcement learning：an overview[J].arXiv：1701.07274，2017.
[20] EDELKAMP S，GATH M，CAZENAVE T，et al.Algorithm and knowledge engineering for the TSPTW problem[C]//2013 IEEE Symposium on Computational Intelligence in Scheduling，2013：44-51.
[21] AKHTAR M，HANNAN M A，BEGUM R A，et al.Backtracking search algorithm in CVRP models for efficient solid waste collection and route optimization[J].Waste Management，2017，61：117-128.
[22] GAMBARDELLA L M，TAILLARD é，AGAZZI G.MACS-VRPTW：a multiple colony system for vehicle routing problems with time windows[M]//New ideas in optimization.London：McGraw-Hill，1999.
[23] WANG Z，SHEU J B.Vehicle routing problem with drones[J].Transportation Research Part B：Methodological，2019，122：350-364.
[24] PENNAP H V，SUBRAMANIAN A，OCHI L S.An iterated local search heuristic for the heterogeneous fleet vehicle routing problem[J].Journal of Heuristics，2013，19（2）：201-232.
[25] BELLO I，PHAM H，LE Q V，et al.Neural combinatorial optimization with reinforcement learning[J].arXiv：1611. 09940，2016.
[26] LI K，ZHANG T，WANG R.Deep reinforcement learning for multi-objective optimization[J].IEEE Transactions on Cybernetics，2020，51（6）：3103-3114.
[27] BRESSON X，LAURENT T.The transformer network for the traveling salesman problem[J].arXiv：2103.03012，2021.
[28] DEUDON M，COURNUT P，LACOSTE A，et al.Learning heuristics for the TSP by policy gradient[C]//Proceedings of the International Conference on the Integration of Constraint Programming，Artificial Intelligence，and Operations Research，Delft，June 26-29，2018.Cham：Springer，2018：170-181.
[29] WU Y，SONG W，CAO Z，et al.Learning improvement heuristics for solving routing problems[J].IEEE Transactions on Neural Networks and Learning Systems，2022，33（9）：5057-5069.
[30] KHALIL E，DAI H，ZHANG Y，et al.Learning combinatorial optimization algorithms over graphs[C]//Advances in Neural Information Processing Systems，2017.
[31] JOSHI C K，LAURENT T，BRESSON X.On learning paradigms for the travelling salesman problem[J].arXiv：1910.07210，2019.
[32] CAPPART Q，MOISAN T，ROUSSEAU L M，et al.Combining reinforcement learning and constraint programming for combinatorial optimization[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2021：3677-3687.
[33] FU Z H，QIU K B，ZHA H.Generalize a small pretrained model to arbitrarily large TSP instances[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2021，35（8）：7474-7482.
[34] BOGYRBAYEVA A，YOON T，KO H，et al.A deep reinforcement learning approach for solving the traveling salesman problem with drone[J].arXiv：2112.12545，2021.
[35] OREN J，ROSS C，LEFAROV M，et al.SOLO：search online，learn offline for combinatorial optimization problems[C]//Proceedings of the International Symposium on Combinatorial Search，Jinan，July 26-30，2021.Palo Alto：AAAI，2021：97-105.
[36] LI K，ZHANG T，WANG R，et al.Deep reinforcement learning for combinatorial optimization：covering salesman problems[J].IEEE Transactions on Cybernetics，2022，52（12）：13142-13155.
[37] 王扬，陈智斌，杨笑笑，等.深度强化学习结合图注意力模型求解TSP问题[J].南京大学学报（自然科学版），2022，58（3）：420-429.
WANG Y，CHEN Z B，YANG X X，et al.Deep reinforcement learning combined with graph attention model to solve TSP[J].Journal of Nanjing University（Natural Sciences），2022，58（3）：420-429.
[38] BASSO R，KULCSAR B，SANCHEZ-DIAZ I，et al.Dynamic stochastic electric vehicle routing with safe reinforcement learning[J].Transportation Research Part E：Logistics and Transportation Review，2022，157：102496.
[39] ZHANG R，PROKHORCHUK A，DAUWELS J.Deep reinforcement learning for traveling salesman problem with time windows and rejections[C]//2020 International Joint Conference on Neural Networks（IJCNN），2020：1-8.
[40] NAZARI M，OROOJLOOY A，SNYDER L，et al.Reinforcement learning for solving the vehicle routing problem[C]//Advances in Neural Information Processing Systems，2018.
[41] CHEN X Y，TIAN Y D.Learning to perform local rewriting for combinatorial optimization[C]//Proceedings of the 33rd Conference on Advances in Neural Information Processing Systems，Vancouver，Dec 8-14，2019.Curran：NIPS，2019：6281-6292.
[42] GAO L，CHEN M，CHEN Q，et al.Learn to design the heuristics for vehicle routing problem[J].arXiv：2002. 08539，2020.
[43] 王扬，陈智斌.一种动态图转换模型求解CVRP问题[J/OL].计算机工程与科学：1-11[2022-06-18].http：//kns.cnki.net/kcms/detail/43.1258.TP.20220510.1905.002.html.
WANG Y，CHEN Z B.Solve CVRP by a dynamic graph transformer model[J/OL].Computer Engineering and Science：1-11[2022-06-18].http：//kns.cnki.net/kcms/detail/43. 1258.TP.20220510.1905.002.html.
[44] ZHAO J，MAO M，ZHAO X，et al.A hybrid of deep reinforcement learning and local search for the vehicle routing problems[J].IEEE Transactions on Intelligent Transportation Systems，2020，22（11）：7208-7218.
[45] VERA J M，ABAD A G.Deep reinforcement learning for routing a heterogeneous fleet of vehicles[C]//2019 IEEE Latin American Conference on Computational Intelligence，2019：1-6.
[46] LIN B，GHADDAR B，NATHWANI J.Deep reinforcement learning for the electric vehicle routing problem with time windows[J].IEEE Transactions on Intelligent Transportation Systems，2022，23（8）：11528-11538.
[47] PAN W，LIU S Q.Deep reinforcement learning for the dynamic and uncertain vehicle routing problem[J].Applied Intelligence，2022：1-18.
[48] PENG B，WANG J，ZHANG Z.A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems[C]//International Symposium on Intelligence Computation and Applications.Singapore：Springer，2019：636-650.
[49] FALKNER J K，THYSSENS D，SCHMIDT-THIEME L.Large neighborhood search based on neural construction heuristics[J].arXiv：2205.00772，2022.
[50] HOTTUNG A，TIERNEY K.Neural large neighborhood search for the capacitated vehicle routing problem[J].arXiv：1911.09539，2019.
[51] MA Y，LI J，CAO Z，et al.Learning to iteratively solve routing problems with dual-aspect collaborative transformer[C]//Advances in Neural Information Processing Systems，2021：11096-11107.
[52] MA Y，LI J，CAO Z，et al.Efficient neural neighborhood search for pickup and delivery problems[J].arXiv：2204. 11399，2022.
[53] NOWAK A，VILLAR S，BANDEIRA A S，et al.Revised note on learning quadratic assignment with graph neural networks[C]//2018 IEEE Data Science Workshop（DSW），2018：1-5.
[54] ZHENG H，LI X，LI Y，et al.GCN-GAN：integrating graph convolutional network and generative adversarial network for traffic flow prediction[J].IEEE Access，2022，10：94051-94062.
[55] GROSHEV E，GOLDSTEIN M，TAMAR A，et al.Learning generalized reactive policies using deep neural networks[C]//Twenty-Eighth International Conference on Automated Planning and Scheduling，2018.
[56] JOSHI C K，LAURENT T，BRESSON X.An efficient graph convolutional network technique for the traveling salesman problem[J].arXiv：1906.01227，2019.
[57] PRATES M，AVELAR P H C，LEMOS H，et al.Learning to solve NP-complete problems：a graph neural network for decision TSP[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2019，33（1）：4731-4738.
[58] JAMES J Q，YU W，GU J.Online vehicle routing with neural combinatorial optimization and deep reinforcement learning[J].IEEE Transactions on Intelligent Transportation Systems，2019，20（10）：3806-3817.
[59] LU H，ZHANG X，YANG S.A learning-based iterative method for solving vehicle routing problems[C]//International Conference on Learning Representations，2019.
[60] XIN L，SONG W，CAO Z，et al.NeuroLKH：combining deep learning model with Lin-Kernighan-Helsgaun heuristic for solving the traveling salesman problem[C]//Advances in Neural Information Processing Systems，2021：7472-7483.
[61] 王原，陈名，邢立宁，等.用于求解旅行商问题的深度智慧型蚁群优化算法[J].计算机研究与发展，2021，58（8）：1586-1598.
WANG Y，CHEN M，XING L N，et al.Deep intelligent ant colony optimization for solving traveling salesman problem[J].Journal of Computer Research and Development，2021，58（8）：1586-1598.
[62] ZHENG J，HE K，ZHOU J，et al.Combining reinforcement learning with Lin-Kernighan-Helsgaun algorithm for the traveling salesman problem[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2021.
[63] WANG R，HUA Z，LIU G，et al.A bilevel framework for learning to solve combinatorial optimization on graphs[C]//Advances in Neural Information Processing Systems，2021：21453-21466.
[64] KOOL W，VAN HOOF H，GROMICHO J，et al.Deep policy dynamic programming for vehicle routing problems[C]//International Conference on Integration of Constraint Programming，Artificial Intelligence，and Operations Research.Cham：Springer，2022：190-213.
[65] BO P，WANG J H，ZHANG Z Z.A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems[C]//Proceedings of the International Symposium on Intelligence Computation and Applications，Guangzhou，Nov 16-17，2019.Singapore：Springer，2019：636-650.
[66] DELARUE A，ANDERSON R，TJANGRAATMADJA C.Reinforcement learning with combinatoria lactions：an application to vehicle routing[C]//Advances in Neural Information Processing Systems，2020：609-620.
[67] DA COSTA P R，RHUGGENAATH J，ZHANG Y，et al.Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning[C]//Asian Conference on Machine Learning，2020：465-480.
[68] 郭田德，韩丛英，唐思琦.组合优化机器学习方法[M].北京：科学出版社，2019：74-98.
GUO T D，HAN C Y，TANG S Q.Machine learning methods for combinatorial optimization[M].Beijing：Science Press，2019：74-98.
[69] ALIPOUR M M，RAZAVI S N，DERAKHSHI M F，et al.A hybrid algorithm using a genetic algorithm and multiagent reinforcement learning heuristic to solve the traveling salesman problem[J].Neural Computing and Applications，2018，30（9）：2935-2951.
[70] MAZYAVKINA N，SVIRIDOV S，IVANOV S，et al.Reinforcement learning for combinatorial optimization：a survey[J].Computers & Operations Research，2021，134：105400.
[71] KOTARY J，FIORETTO F，VAN HENTENRYCK P，et al.End-to-end constrained optimization learning：a survey[J].arXiv：2103.16378，2021.
[72] CAPPART Q，CHETELAT D，KHALIL E，et al.Combinatorial optimization and reasoning with graph neural networks[J].arXiv：2102.09544，2021.
[73] BENGIO Y，LODI A，PROUVOST A.Machine learning for combinatorial optimization：a methodological tour d’horizon[J].European Journal of Operational Research，2021，290（2）：405-421.