Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (1): 41-55.DOI: 10.3778/j.issn.1002-8331.2108-0467
• Research Hotspots and Reviews • Previous Articles Next Articles
NIU Pengfei, WANG Xiaofeng, LU Lei, ZHANG Jiulong
Online:
2022-01-01
Published:
2022-01-06
牛鹏飞,王晓峰,芦磊,张九龙
NIU Pengfei, WANG Xiaofeng, LU Lei, ZHANG Jiulong. Survey on Vehicle Reinforcement Learning in Routing Problem[J]. Computer Engineering and Applications, 2022, 58(1): 41-55.
牛鹏飞, 王晓峰, 芦磊, 张九龙. 强化学习在车辆路径问题中的研究综述[J]. 计算机工程与应用, 2022, 58(1): 41-55.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2108-0467
[1] DANTZIG G B,RAMSER J H.The truck dispatching problem[J].Management Science,1959,6(1):80-91. [2] LENSTRA J K,KAN A H G R.Complexity of vehicle routing and scheduling problems[J].Networks,1981,11(2):221-227. [3] CLARKE G,WRIGHT J W.Scheduling of vehicles from a central depot to a number of delivery points[J].Operations Research,1964,12(4):568-581. [4] JUNGER M,REINELT G,RINALDI G.The traveling salesman problem[J].Handbooks in Operations Research and Management Science,1995,7:225-330. [5] VIDAL T,CRAINIC T G,GENDREAU M,et al.Heuristics for multi-attribute vehicle routing problems:A survey and synthesis[J].European Journal of Operational Research,2013,231(1):1-21. [6] ULMER M W,THOMAS B W.Same-day delivery with heterogeneous fleets of drones and vehicles[J].Networks,2018,72(4):475-505. [7] MANDZIUK J.New shades of the vehicle routing problem:Emerging problem formulations and computational intelligence solution methods[J].IEEE Transactions on Emerging Topics in Computational Intelligence,2018,3(3):230-244. [8] CAMM J D,MAGAZINE M J,KUPPUSAMY S,et al.The demand weighted vehicle routing problem[J].European Journal of Operational Research,2017,262(1):151-162. [9] LU D,GZARA F.The robust vehicle routing problem with time windows:Solution by branch and price and cut[J].European Journal of Operational Research,2019,275(3):925-938. [10] LI H,CHANG X,ZHAO W,et al.The vehicle flow formu lation and savings-based algorithm for the rollon-rolloff vehicle routing problem[J].European Journal of Operational Research,2017,257(3):859-869. [11] MOONS S,RAMAEKERS K,CARIS A,et al.Integrating production scheduling and vehicle routing decisions at the operational decision level:A review and discussion[J].Computers & Industrial Engineering,2017,104:224-245. [12] KONG Y,TANG J F,DONG G,et al.An insertion algorithm for vehicle scheduling in picking up and delivering customers to airport[J].Control Theory & Applications,2009,26(1):92-96. [13] 穆东,王超,王胜春,等.基于并行模拟退火算法求解时间依赖型车辆路径问题[J].计算机集成制造系统,2015,21(6):1626-1636. MU D,WANG C,WANG S C,et al.Solving time-dependent vehicle routing problem based on parallel simulated annealing algorithm[J].Computer Integrated Manufacturing System,2015,21(6):1626-1636. [14] WANG Y,WU Q,GLOVER F.Effective metaheuristic algorithms for the minimum differential dispersion problem[J].European Journal of Operational Research,2017,258(3):829-843. [15] PINTEA C M,CHIRA C,DUMITRESCU D,et al.Senstive ants in solving the generalized vehicle routing problem[J].International Journal of Computers Communications & Control,2011,6(4):734-741. [16] FERNANDEZ-V ARGAS J A,BO NILLA-PETRICIOLET A.Development of a global optimization algorithm in ant colonies with feasible region selection for continuous search spaces[J].Revista Internacional de Metodos Numericos Para Calculoy Diseno en Ingenieria,2014,30(3):178-187. [17] JIA Y H,CHEN W N,GU T L,et al.A dynamic logistic dispatching system with set-based particle swarm optimization[J].IEEE Transactions on Systems Man Cybernetics-Systems,2018,48(9):1607-1621. [18] SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine Learning,1988,3(1):9-44. [19] WATKINS C J C H,DAYAN P.Q-learning[J].Machine Learning,1992,8(3/4):279-292. [20] MNIH V,KAVUKCUOGLU K,Silver D,et al.Playing ATARI with deep reinforcement learning[J].arXiv:1312. 5602,2013. [21] WILLIAMS R J.Simple statistical gradient-following algorithms for connectionist reinforcement learning[J].Machine Learning,1992,8(3):229-256. [22] PETERS J,SCHAAL S.Natural actor-critic[J].Neurocom- puting,2008,71(7/9):1180-1190. [23] HA D,SCHMIDHUBER J.World models[J].arXiv:1803. 10122,2018. [24] SECOMANDI N.Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands[J].Computers & Operations Research,2000,27(11/12):1201-1225. [25] TATARAKIS A,MINIS I.Stochastic single vehicle routing with a predefined customer sequence and multiple depot returns[J].European Journal of Operational Research,2009,197(2):557-571. [26] POWELL W B,SIMAO H P,BOUZAIENE-AYARI B.Approximate dynamic programming in transportation and logistics:A unified framework[J].EURO Journal on Transportation and Logistics,2012,1(3):237-284. [27] ?IMEN M,SOYSAL M.Time-dependent green vehicle routing problem with stochastic vehicle speeds:An approximate dynamic programming algorithm[J].Transportation Research Part D(Transport and Environment),2017,54:82-98. [28] ULMER M W,MATTFELD D C,K?ster F.Budgeting time for dynamic vehicle routing with stochastic customer requests[J].Transportation Science,2018,52(1):20-37. [29] ULMER M W,GOODSON J C,MATTFELD D C,et al.Offline?online approximate dynamic programming for dynamic vehicle routing with stochastic requests[J].Transportation Science,2019,53(1):185-202. [30] KOK A L,HANS E W,SCHUTTEN J M J.Vehicle routing under time-dependent travel times:the impact of congestion avoidance[J].Computers & Operations Research,2012,39(5):910-918. [31] SECOMANDI N,MARGOT F.Reoptimization approaches for the vehicle-routing problem with stochastic demands[J].Operations Research,2009,57(1):214-230. [32] GOODSON J C,OHLMANN J W,THOMAS B W.Rollout policies for dynamic solutions to the multivehicle routing problem with stochastic demand and duration limits[J].Operations Research,2013,61(1):138-154. [33] KOOL W,VAN HOOF H,GROMICHO J,et al.Deep policy dynamic programming for vehicle routing problems[J].arXiv:2102.11756,2021. [34] ZHANG C,DELLAERT N P,ZHAO L,et al.Single vehicle routing with stochastic demands:approximate dynamic programming[R].Eindhoven:Technische Universiteit Eindhoven,2013. [35] JOE W,LAU H C.Deep reinforcement learning approach to solve dynamic vehicle routing problem with stochastic customers[C]//Proceedings of the International Conference on Automated Planning and Scheduling,2020:394-402. [36] BOUHAMED O,GHAZZAI H,BESBES H,et al.Q-learning based routing scheduling for a multi-task autonomous agent[C]//Proceedings of IEEE 62nd International Midwest Symposium on Circuits and Systems(MWSCAS),2019:634-637. [37] BDEIR A,BOEDER S,DERNEDDE T,et al.RP-DQN:An application of Q-learning to vehicle routing problems[J].arXiv:2104.12226,2021. [38] KOOL W,VAN HOOF H,WELLING M.Attention,learn to solve routing problems![J].arXiv:1803.08475,2018. [39] CHEN X,ULMER M W,THOMAS B W.Deep Q-learning for same-day delivery with vehicles and drones[J/OL].European Journal of Operational Research(2021?06?17)[2021?08?01].https://www.sciencedirect.com/science/article/abs/pii/S0377221721005361. [40] ULMER M W,GOODSON J C,MATTFELD D C,et al.On modeling stochastic dynamic vehicle routing problems[J].EURO Journal on Transportation and Logistics,2020,9(2):100008. [41] WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning.New York:Curran Associates,2016:1995-2003. [42] ZHANG W,WANG Q,LI J,et al.Dynamic fleet manage ment with rewriting deep reinforcement learning[J].IEEE Access,2020,8:143333-143341. [43] 刘建伟,高峰,罗雄麟.基于值函数和策略梯度的深度强化学习综述[J].计算机学报,2019,42(6):1406-1438. LIU J W,G F,L X L.Survey of deep reinforcement learning based on value function and policy gradient[J].Chinese Journal of Computers,2019,42(6):1406-1438. [44] VINYALS O,FORTUNATO M,JAITLY N.Pointer networks[J].Advances in Neural Information Processing Systems,2015,28:2692-2700. [45] NAZARI M,OROOJLOOY A,TAKAC M,et al.Reinforce ment learning for solving the vehicle routing problem[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.New Work:Curran Associates,2018:9861-9871. [46] XIN L,SONG W,CAO Z,et al.Step-wise deep learning models for solving routing problems[J].IEEE Transactions on Industrial Informatics,2020,17(7):4861-4871. [47] FALKNER J K,SCHMIDT-THIEME L.Learning to solve vehicle routing problems with time windows through joint attention[J].arXiv:2006.09100,2020. [48] XIN L,SONG W,CAO Z,et al.Multi-decoder attention model with embedding glimpse for solving vehicle routing problems[C]//Proceedings of 35th AAAI Conference on Artificial Intelligence,2021:12042-12049. [49] ZHANG K,HE F,ZHANG Z,et al.Multi-vehicle routing problems with soft time windows:A multi-agent reinforcement learning approach[J].Transportation Research Part C(Emerging Technologies),2020,121:102861. [50] XU Y,FANG M,CHEN L,et al.Reinforcement learning with multiple relational attention for solving vehicle routing problems[J/OL].IEEE Transactions on Cybernetics(2021-07-08)[2021-08-01].https://ieeexplore.ieee.org/abstract/document/9478307. [51] PENG B,WANG J,ZHANG Z.A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems[C]//Proceedings of International Symposium on Intelligence Computation and Applications,2019:636-650. [52] LU H,ZHANG X,YANG S.A learning-based iterative method for solving vehicle routing problems[C]//Proceedings of International Conference on Learning Representations,2021. [53] HOTTUNG A,TIERNEY K.Neural large neighborhood search for the capacitated vehicle routing problem[J].arXiv:1911.09539,2019. [54] CHEN X,TIAN Y.Learning to perform local rewriting for combinatorial optimization[J].arXiv:1810.00337,2018. [55] ZHAO J,MAO M,ZHAO X,et al.A hybrid of deep reinforcement learning and local search for the vehicle routing problems[J].IEEE Transactions on Intelligent Transportation Systems,2020,61(1):7208-7218. [56] LI J,XIN L,CAO Z,et al.Heterogeneous attentions for solving pickup and delivery problem via deep reinforcement learning[J/OL].IEEE Transactions on Intelligent Transportation Systems(2021-02-10)[2021-08-01].https://ieeexplore.ieee.org/abstract/document/9352489. [57] GAO L,CHEN M,CHEN Q,et al.Learn to design the heuristics for vehicle routing problem[J].arXiv:2002. 08539,2020. [58] VERA J M,ABAD A G.Deep reinforcement learning for routing a heterogeneous fleet of vehicles[C]//Proceedings of 2019 IEEE Latin American Conference on Computational Intelligence(LA-CCI),2019:1-6. |
[1] | SI Yanna, PU Jiexin, SUN Lifan. Review of Research on Approximate Reinforcement Learning Algorithms [J]. Computer Engineering and Applications, 2022, 58(8): 33-44. |
[2] | GAO Jingpeng, HU Xinyu, JIANG Zhiye. Unmanned Aerial Vehicle Track Planning Algorithm Based on Improved DDPG [J]. Computer Engineering and Applications, 2022, 58(8): 264-272. |
[3] | XU Jie, ZHU Yukun, XING Chunxiao. Research on Financial Trading Algorithm Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2022, 58(7): 276-285. |
[4] | ZHAO Shuxu, YUAN Lin, ZHANG Zhanping. Multi-agent Edge Computing Task Offloading [J]. Computer Engineering and Applications, 2022, 58(6): 177-182. |
[5] | DENG Xin, NA Jun, ZHANG Handuo, WANG Yulin, ZHANG Bin. Personalized Adjustment Method of Intelligent Lamp Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2022, 58(6): 264-270. |
[6] | CHEN Zhongyu, HAN Xie, XIE Jianbin, XIONG Fengguang, KUANG Liqun. Reinforcement Learning-Based Image Matching Method Under Double Loss Estimations [J]. Computer Engineering and Applications, 2022, 58(5): 240-246. |
[7] | XU Bo, ZHOU Jianguo, WU Jing, LUO Wei. Routing Optimization Method Based on DDPG and Programmable Data Plane [J]. Computer Engineering and Applications, 2022, 58(3): 143-150. |
[8] | SONG Haonan, ZHAO Gang, SUN Ruoying. Developments of Knowledge Reasoning Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2022, 58(1): 12-25. |
[9] | WANG Xiao, TANG Lun, HE Xiaoyu, CHEN Qianbin. Multi-dimensional Resource Optimization of Service Function Chain Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(4): 68-76. |
[10] | LAI Jun, WEI Jingyi, CHEN Xiliang. Overview of Hierarchical Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(3): 72-79. |
[11] | MA Zhihao, ZHU Xiangbin. Research on Quasi-hyperbolic Momentum Gradient for Adversarial Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(24): 90-99. |
[12] | LI Baoshuai, YE Chunming. Job Shop Scheduling Problem Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(23): 248-254. |
[13] | LI Qian, JIANG Li, LIANG Changyong. Multi-objective Cold Chain Distribution Optimization Based on Fuzzy Time Window [J]. Computer Engineering and Applications, 2021, 57(23): 255-262. |
[14] | CHENG Yi, HAO Mimi. Path Planning for Indoor Mobile Robot with Improved Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(21): 256-262. |
[15] | WANG Jun, CAO Lei, CHEN Xiliang, LAI Jun, ZHANG Legui. Overview on Reinforcement Learning of Multi-agent Game [J]. Computer Engineering and Applications, 2021, 57(21): 1-13. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||