计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (5): 1-13.DOI: 10.3778/j.issn.1002-8331.2210-0153
杨笑笑,柯琳,陈智斌
出版日期:
2023-03-01
发布日期:
2023-03-01
YANG Xiaoxiao, KE Lin, CHEN Zhibin
Online:
2023-03-01
Published:
2023-03-01
摘要: 车辆路径问题(VRP)是组合优化问题中经典的NP难问题,广泛应用于交通、物流等领域,随着问题规模和动态因素的增多,传统算法很难快速、智能地求解复杂的VRP问题。近年来随着人工智能技术的发展,尤其是深度强化学习(DRL)在AlphaGo中的成功应用,为路径问题求解提供了全新思路。鉴于此,针对近年来利用DRL求解VRP及其变体问题的模型进行文献综述。回顾了DRL求解VRP的相关思路,并梳理基于DRL求解VRP问题的关键步骤,对基于指针网络、图神经网络、Transformer和混合模型的四类求解方法分类总结,同时对目前基于DRL求解VRP及其变体问题的模型性能进行对比分析,总结了基于DRL求解VRP问题时遇到的挑战以及未来的研究方向。
杨笑笑, 柯琳, 陈智斌. 深度强化学习求解车辆路径问题的研究综述[J]. 计算机工程与应用, 2023, 59(5): 1-13.
YANG Xiaoxiao, KE Lin, CHEN Zhibin. Review of Deep Reinforcement Learning Model Research on Vehicle Routing Problems[J]. Computer Engineering and Applications, 2023, 59(5): 1-13.
[1] COOK W J,CUNNINGHAM W H,PULLEYBLANK W R,et al.Combinatorial optimization[M].[S.l.]:Wiley,2010:2-30. [2] KARIMI-MAMAGHAN M,MOHAMMADI M,MEYER P,et al.Machine learning at the service of meta-heuristics for solving combinatorial optimization problems:a state-of-the-art[J].European Journal of Operational Research,2021,296(2):393-422. [3] 刘振宏,蔡茂诚.组合最优化算法和复杂性[M].北京:清华大学出版社,1988:1-19. LIU Z H,CAI M C.Combinatorial optimization algorithms and complexity[M].Beijing:Tsinghua University Press,1988:1-19. [4] LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444. [5] SMIRNOV E A,TIMOSHENKO D M,RIANOV S N.Comparison of regularization methods for ImageNet classi-fication with deep convolutional neural networks[J].AASRI Procedia,2014,6:89-94. [6] YOGATAMA D,BLUNSOM P,DYER C,et al.Learning to compose words into sentences with reinforcement learning[J].arXiv:1611.09100,2016. [7] AKSHITA,SMITA.Recommender system:review[J].International Journal of Computer Applications,2013,71(24):38-42. [8] 刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报,2018,41(1):1-27. LIU Q,ZHAI J W,ZHANG Z Z,et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):1-27. [9] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [10] TANG Z,SHAO K,ZHAO D,et al.Recent progress of deep reinforcement learning:from AlphaGo to AlphaGo Zero[J].Control Theory & Applications,2017,34(12):1529-1546. [11] 李珺,段钰蓉,郝丽艳,等.混合优化算法求解同时送取货车辆路径问题[J].计算机科学与探索,2022,16(7):1623-1632. LI J,DUAN Y R,HAO L Y,et al.Hybrid optimization algorithm for vehicle routing problem with simultaneous delivery-pickup[J].Journal of Frontiers of Computer Science and Technology,2022,16(7):1623-1632. [12] VINALYS O,FORTUNATO M,JAITLY N.Pointer networks[C]//Advances in Neural Information Processing Systems,2015. [13] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014. [14] SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model[J].IEEE Transactions on Neural Networks,2008,20(1):61-80. [15] MA Q,GE S,HE D,et al.Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning[J].arXiv:1911.04936,2019. [16] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems,Long Beach,California,Dec 4-9,2017.Red Hook:Curran Associates Inc,2017:6000-6010. [17] KOOL W,VAN HOOF H,WELLING M.Attention,learn to solve routing problems[J].arXiv:1803.08475,2018. [18] PUTERMAN M L.Markov decision processes[J].Handbooks in Operations Research and Management Science,1990,2:331-434. [19] LI Y.Deep reinforcement learning:an overview[J].arXiv:1701.07274,2017. [20] EDELKAMP S,GATH M,CAZENAVE T,et al.Algorithm and knowledge engineering for the TSPTW problem[C]//2013 IEEE Symposium on Computational Intelligence in Scheduling,2013:44-51. [21] AKHTAR M,HANNAN M A,BEGUM R A,et al.Backtracking search algorithm in CVRP models for efficient solid waste collection and route optimization[J].Waste Management,2017,61:117-128. [22] GAMBARDELLA L M,TAILLARD é,AGAZZI G.MACS-VRPTW:a multiple colony system for vehicle routing problems with time windows[M]//New ideas in optimization.London:McGraw-Hill,1999. [23] WANG Z,SHEU J B.Vehicle routing problem with drones[J].Transportation Research Part B:Methodological,2019,122:350-364. [24] PENNAP H V,SUBRAMANIAN A,OCHI L S.An iterated local search heuristic for the heterogeneous fleet vehicle routing problem[J].Journal of Heuristics,2013,19(2):201-232. [25] BELLO I,PHAM H,LE Q V,et al.Neural combinatorial optimization with reinforcement learning[J].arXiv:1611. 09940,2016. [26] LI K,ZHANG T,WANG R.Deep reinforcement learning for multi-objective optimization[J].IEEE Transactions on Cybernetics,2020,51(6):3103-3114. [27] BRESSON X,LAURENT T.The transformer network for the traveling salesman problem[J].arXiv:2103.03012,2021. [28] DEUDON M,COURNUT P,LACOSTE A,et al.Learning heuristics for the TSP by policy gradient[C]//Proceedings of the International Conference on the Integration of Constraint Programming,Artificial Intelligence,and Operations Research,Delft,June 26-29,2018.Cham:Springer,2018:170-181. [29] WU Y,SONG W,CAO Z,et al.Learning improvement heuristics for solving routing problems[J].IEEE Transactions on Neural Networks and Learning Systems,2022,33(9):5057-5069. [30] KHALIL E,DAI H,ZHANG Y,et al.Learning combinatorial optimization algorithms over graphs[C]//Advances in Neural Information Processing Systems,2017. [31] JOSHI C K,LAURENT T,BRESSON X.On learning paradigms for the travelling salesman problem[J].arXiv:1910.07210,2019. [32] CAPPART Q,MOISAN T,ROUSSEAU L M,et al.Combining reinforcement learning and constraint programming for combinatorial optimization[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021:3677-3687. [33] FU Z H,QIU K B,ZHA H.Generalize a small pretrained model to arbitrarily large TSP instances[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021,35(8):7474-7482. [34] BOGYRBAYEVA A,YOON T,KO H,et al.A deep reinforcement learning approach for solving the traveling salesman problem with drone[J].arXiv:2112.12545,2021. [35] OREN J,ROSS C,LEFAROV M,et al.SOLO:search online,learn offline for combinatorial optimization problems[C]//Proceedings of the International Symposium on Combinatorial Search,Jinan,July 26-30,2021.Palo Alto:AAAI,2021:97-105. [36] LI K,ZHANG T,WANG R,et al.Deep reinforcement learning for combinatorial optimization:covering salesman problems[J].IEEE Transactions on Cybernetics,2022,52(12):13142-13155. [37] 王扬,陈智斌,杨笑笑,等.深度强化学习结合图注意力模型求解TSP问题[J].南京大学学报(自然科学版),2022,58(3):420-429. WANG Y,CHEN Z B,YANG X X,et al.Deep reinforcement learning combined with graph attention model to solve TSP[J].Journal of Nanjing University(Natural Sciences),2022,58(3):420-429. [38] BASSO R,KULCSAR B,SANCHEZ-DIAZ I,et al.Dynamic stochastic electric vehicle routing with safe reinforcement learning[J].Transportation Research Part E:Logistics and Transportation Review,2022,157:102496. [39] ZHANG R,PROKHORCHUK A,DAUWELS J.Deep reinforcement learning for traveling salesman problem with time windows and rejections[C]//2020 International Joint Conference on Neural Networks(IJCNN),2020:1-8. [40] NAZARI M,OROOJLOOY A,SNYDER L,et al.Reinforcement learning for solving the vehicle routing problem[C]//Advances in Neural Information Processing Systems,2018. [41] CHEN X Y,TIAN Y D.Learning to perform local rewriting for combinatorial optimization[C]//Proceedings of the 33rd Conference on Advances in Neural Information Processing Systems,Vancouver,Dec 8-14,2019.Curran:NIPS,2019:6281-6292. [42] GAO L,CHEN M,CHEN Q,et al.Learn to design the heuristics for vehicle routing problem[J].arXiv:2002. 08539,2020. [43] 王扬,陈智斌.一种动态图转换模型求解CVRP问题[J/OL].计算机工程与科学:1-11[2022-06-18].http://kns.cnki.net/kcms/detail/43.1258.TP.20220510.1905.002.html. WANG Y,CHEN Z B.Solve CVRP by a dynamic graph transformer model[J/OL].Computer Engineering and Science:1-11[2022-06-18].http://kns.cnki.net/kcms/detail/43. 1258.TP.20220510.1905.002.html. [44] ZHAO J,MAO M,ZHAO X,et al.A hybrid of deep reinforcement learning and local search for the vehicle routing problems[J].IEEE Transactions on Intelligent Transportation Systems,2020,22(11):7208-7218. [45] VERA J M,ABAD A G.Deep reinforcement learning for routing a heterogeneous fleet of vehicles[C]//2019 IEEE Latin American Conference on Computational Intelligence,2019:1-6. [46] LIN B,GHADDAR B,NATHWANI J.Deep reinforcement learning for the electric vehicle routing problem with time windows[J].IEEE Transactions on Intelligent Transportation Systems,2022,23(8):11528-11538. [47] PAN W,LIU S Q.Deep reinforcement learning for the dynamic and uncertain vehicle routing problem[J].Applied Intelligence,2022:1-18. [48] PENG B,WANG J,ZHANG Z.A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems[C]//International Symposium on Intelligence Computation and Applications.Singapore:Springer,2019:636-650. [49] FALKNER J K,THYSSENS D,SCHMIDT-THIEME L.Large neighborhood search based on neural construction heuristics[J].arXiv:2205.00772,2022. [50] HOTTUNG A,TIERNEY K.Neural large neighborhood search for the capacitated vehicle routing problem[J].arXiv:1911.09539,2019. [51] MA Y,LI J,CAO Z,et al.Learning to iteratively solve routing problems with dual-aspect collaborative transformer[C]//Advances in Neural Information Processing Systems,2021:11096-11107. [52] MA Y,LI J,CAO Z,et al.Efficient neural neighborhood search for pickup and delivery problems[J].arXiv:2204. 11399,2022. [53] NOWAK A,VILLAR S,BANDEIRA A S,et al.Revised note on learning quadratic assignment with graph neural networks[C]//2018 IEEE Data Science Workshop(DSW),2018:1-5. [54] ZHENG H,LI X,LI Y,et al.GCN-GAN:integrating graph convolutional network and generative adversarial network for traffic flow prediction[J].IEEE Access,2022,10:94051-94062. [55] GROSHEV E,GOLDSTEIN M,TAMAR A,et al.Learning generalized reactive policies using deep neural networks[C]//Twenty-Eighth International Conference on Automated Planning and Scheduling,2018. [56] JOSHI C K,LAURENT T,BRESSON X.An efficient graph convolutional network technique for the traveling salesman problem[J].arXiv:1906.01227,2019. [57] PRATES M,AVELAR P H C,LEMOS H,et al.Learning to solve NP-complete problems:a graph neural network for decision TSP[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):4731-4738. [58] JAMES J Q,YU W,GU J.Online vehicle routing with neural combinatorial optimization and deep reinforcement learning[J].IEEE Transactions on Intelligent Transportation Systems,2019,20(10):3806-3817. [59] LU H,ZHANG X,YANG S.A learning-based iterative method for solving vehicle routing problems[C]//International Conference on Learning Representations,2019. [60] XIN L,SONG W,CAO Z,et al.NeuroLKH:combining deep learning model with Lin-Kernighan-Helsgaun heuristic for solving the traveling salesman problem[C]//Advances in Neural Information Processing Systems,2021:7472-7483. [61] 王原,陈名,邢立宁,等.用于求解旅行商问题的深度智慧型蚁群优化算法[J].计算机研究与发展,2021,58(8):1586-1598. WANG Y,CHEN M,XING L N,et al.Deep intelligent ant colony optimization for solving traveling salesman problem[J].Journal of Computer Research and Development,2021,58(8):1586-1598. [62] ZHENG J,HE K,ZHOU J,et al.Combining reinforcement learning with Lin-Kernighan-Helsgaun algorithm for the traveling salesman problem[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021. [63] WANG R,HUA Z,LIU G,et al.A bilevel framework for learning to solve combinatorial optimization on graphs[C]//Advances in Neural Information Processing Systems,2021:21453-21466. [64] KOOL W,VAN HOOF H,GROMICHO J,et al.Deep policy dynamic programming for vehicle routing problems[C]//International Conference on Integration of Constraint Programming,Artificial Intelligence,and Operations Research.Cham:Springer,2022:190-213. [65] BO P,WANG J H,ZHANG Z Z.A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems[C]//Proceedings of the International Symposium on Intelligence Computation and Applications,Guangzhou,Nov 16-17,2019.Singapore:Springer,2019:636-650. [66] DELARUE A,ANDERSON R,TJANGRAATMADJA C.Reinforcement learning with combinatoria lactions:an application to vehicle routing[C]//Advances in Neural Information Processing Systems,2020:609-620. [67] DA COSTA P R,RHUGGENAATH J,ZHANG Y,et al.Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning[C]//Asian Conference on Machine Learning,2020:465-480. [68] 郭田德,韩丛英,唐思琦.组合优化机器学习方法[M].北京:科学出版社,2019:74-98. GUO T D,HAN C Y,TANG S Q.Machine learning methods for combinatorial optimization[M].Beijing:Science Press,2019:74-98. [69] ALIPOUR M M,RAZAVI S N,DERAKHSHI M F,et al.A hybrid algorithm using a genetic algorithm and multiagent reinforcement learning heuristic to solve the traveling salesman problem[J].Neural Computing and Applications,2018,30(9):2935-2951. [70] MAZYAVKINA N,SVIRIDOV S,IVANOV S,et al.Reinforcement learning for combinatorial optimization:a survey[J].Computers & Operations Research,2021,134:105400. [71] KOTARY J,FIORETTO F,VAN HENTENRYCK P,et al.End-to-end constrained optimization learning:a survey[J].arXiv:2103.16378,2021. [72] CAPPART Q,CHETELAT D,KHALIL E,et al.Combinatorial optimization and reasoning with graph neural networks[J].arXiv:2102.09544,2021. [73] BENGIO Y,LODI A,PROUVOST A.Machine learning for combinatorial optimization:a methodological tour d’horizon[J].European Journal of Operational Research,2021,290(2):405-421. |
[1] | 朱志国, 李伟玥, 姜盼, 周沛瑶. 图神经网络会话推荐系统综述[J]. 计算机工程与应用, 2023, 59(5): 55-69. |
[2] | 吴国栋, 王雪妮, 刘玉良. 知识图谱增强的图神经网络推荐研究进展[J]. 计算机工程与应用, 2023, 59(4): 18-29. |
[3] | 王永贵, 赵晓暄. 结合自监督学习的图神经网络会话推荐[J]. 计算机工程与应用, 2023, 59(3): 244-252. |
[4] | 张杰, 甄柳琳, 徐硕, 翟东升. 融合传递熵的图神经网络农产品期货预测模型[J]. 计算机工程与应用, 2023, 59(2): 321-328. |
[5] | 魏婷婷, 袁唯淋, 罗俊仁, 张万鹏. 智能博弈对抗中的对手建模方法及其应用综述[J]. 计算机工程与应用, 2022, 58(9): 19-29. |
[6] | 何倩倩, 孙静宇, 曾亚竹. 基于邻域感知图神经网络的会话推荐[J]. 计算机工程与应用, 2022, 58(9): 107-115. |
[7] | 高敬鹏, 胡欣瑜, 江志烨. 改进DDPG无人机航迹规划算法[J]. 计算机工程与应用, 2022, 58(8): 264-272. |
[8] | 赵庶旭, 元琳, 张占平. 多智能体边缘计算任务卸载[J]. 计算机工程与应用, 2022, 58(6): 177-182. |
[9] | 邓心, 那俊, 张瀚铎, 王昱林, 张斌. 基于深度强化学习的智能灯个性化调节方法[J]. 计算机工程与应用, 2022, 58(6): 264-270. |
[10] | 徐博, 周建国, 吴静, 罗威. 可编程数据平面下基于DDPG的路由优化方法[J]. 计算机工程与应用, 2022, 58(3): 143-150. |
[11] | 王永贵, 王阳, 陶明阳, 蔡永旺. 融合全局信息的注意力增强会话推荐方法[J]. 计算机工程与应用, 2022, 58(21): 131-141. |
[12] | 杨涛, 解庆, 刘永坚, 刘平峰. 主题感知的长文本自动摘要算法[J]. 计算机工程与应用, 2022, 58(20): 165-173. |
[13] | 温建伟, 姚冰冰, 万剑雄, 李雷孝. 结合深度强化学习的区块链分片系统性能优化[J]. 计算机工程与应用, 2022, 58(19): 116-123. |
[14] | 靳栋银, 李跃, 邵振洲, 施智平, 关永. 面向机械臂轨迹规划的强化学习奖励函数设计[J]. 计算机工程与应用, 2022, 58(19): 302-308. |
[15] | 方红, 苏铭, 冯一铂, 张澜. 结合gazetteers和句法依存树的中文命名实体识别[J]. 计算机工程与应用, 2022, 58(18): 227-232. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||