深度强化学习求解多目标旅行商问题的研究综述

doi:10.3778/j.issn.1002-8331.2410-0274

摘要/Abstract

摘要： 多目标旅行商问题（MOTSP）是一个具有显著应用价值的组合优化问题（COP），在物流配送、生产调度和网络通信等领域广泛存在。MOTSP不仅需要在多个目标之间寻求平衡，还要求找到不同的帕累托解集，这些解集代表了MOTSP在不同目标之间的全局或局部最优解。传统的多目标优化算法在解决MOTSP时，通常面临计算复杂度高和求解效率低的挑战，尤其是在均衡决策空间和目标空间多样性时，难以有效找到多样化的帕累托最优解。近年来，深度强化学习（DRL）在处理复杂优化问题上展现了巨大的潜力，为解决MOTSP及其帕累托解集的多样化问题提供了一种新的方法。介绍了MOTSP的基本概念和求解方法；讨论了强化学习（RL）中的优化策略和深度学习（DL）的神经网络模型；总结了利用DRL求解MOTSP的理论方法，分析了各代表性模型的优化效果与时效性，突出不同DRL模型在大规模MOTSP问题中的性能表现，并探讨了其局限性、改进方向和适用场景，同时提出了应对局部最优问题的策略。最后，归纳了MOTSP的四大应用研究领域，并指出了MOTSP的未来研究方向。

关键词: 深度强化学习（DRL）, 多目标旅行商问题（MOTSP）, 帕累托最优解, 优化策略, 神经网络模型

Abstract: The multi-objective traveling salesman problem (MOTSP) is a combinatorial optimization problem (COP) with significant practical applications, widely existing in fields such as logistics distribution, production scheduling and network communication. MOTSP not only needs to seek balance between multiple objectives, but also requires finding different Pareto solution sets that represent the global or local optimal solutions of MOTSP between different objectives. Traditional multi-objective optimization algorithms often face challenges of high computational complexity and low solving efficiency when solving MOTSP, especially when balancing the diversity of decision space and objective space. It is difficult to effectively find diverse Pareto optimal solutions. In recent years, deep reinforcement learning (DRL) has shown great potential in dealing with complex optimization problems, providing a new approach to solving the diversity problem of MOTSP and its Pareto solution set. First of all, the basic concepts and solving methods of MOTSP are introduced. Secondly, optimization strategies in reinforcement learning (RL) and neural network models for deep learning (DL) are discussed. Then, the theoretical methods of using DRL to solve MOTSP are summarized, and the optimization effects and timeliness of various representative models are analyzed. The performance of different DRL models in large-scale MOTSP problems is highlighted, and their limitations, improvement directions, and applicable scenarios are discussed. At the same time, strategies for addressing local optimal problems are proposed. Finally, the four major application research areas of MOTSP are summarized, and the future research directions of MOTSP are pointed out.

Key words: deep reinforcement learning (DRL), multi-objective traveling salesman problem (MOTSP), Pareto optimal solution, optimization strategy, neural network model

李成健, 宋姝谊, 粟宇, 陈智斌. 深度强化学习求解多目标旅行商问题的研究综述[J]. 计算机工程与应用, 2025, 61(12): 28-44.

LI Chengjian, SONG Shuyi, SU Yu, CHEN Zhibin. Review of Multi-Objective Traveling Salesman Problem Based on Deep Reinforcement Learning[J]. Computer Engineering and Applications, 2025, 61(12): 28-44.

参考文献

[1] RABINER L. Combinatorial optimization: algorithms and complexity[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984, 32(6): 1258-1259.
[2] JASZKIEWICZ A. Genetic local search for multi-objective combinatorial optimization[J]. European Journal of Operational Research, 2002, 137(1): 50-71.
[3] KE L J, ZHANG Q F, BATTITI R. MOEA/D-ACO: a multiobjective evolutionary algorithm using decomposition and AntColony[J]. IEEE Transactions on Cybernetics, 2013, 43(6): 1845-1859.
[4] ZHOU A M, GAO F, ZHANG G X. A decomposition based estimation of distribution algorithm for multiobjective traveling salesman problems[J]. Computers & Mathematics with Applications, 2013, 66(10): 1857-1868.
[5] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529: 484-489.
[6] 唐振韬, 邵坤, 赵冬斌, 等. 深度强化学习进展: 从AlphaGo到AlphaGo Zero[J]. 控制理论与应用, 2017, 34(12): 1529-1546.
TANG Z T, SHAO K, ZHAO D B, et al. Recent progress of deep reinforcement learning: from AlphaGo to AlphaGo Zero[J]. Control Theory & Applications, 2017, 34(12): 1529-1546.
[7] GENG J Y, YING S, JIA X Y, et al. Supporting many-objective software requirements decision: an exploratory study on the next release problem[J]. IEEE Access, 2018, 6: 60547-60558.
[8] METIAF A, WU Q H, ALJEROUDI Y. Searching with direction awareness: multi-objective genetic algorithm based on angle quantization and crowding distance MOGA-AQCD[J]. IEEE Access, 2019, 7: 10196-10207.
[9] LI K, DEB K, ZHANG Q F, et al. An evolutionary many-objective optimization algorithm based on dominance and decomposition[J]. IEEE Transactions on Evolutionary Computation, 2015, 19(5): 694-716.
[10] 李凯文. 基于深度强化学习的组合优化方法研究[D]. 长沙: 国防科技大学, 2022.
LI K W. Research on combinatorial optimization method based on deep reinforcement learning[D]. Changsha: National University of Defense Technology, 2022.
[11] MIETTINEN K. Nonlinear multiobjective optimization[M]. [S.l.]：Springer Science & Business Media, 1999.
[12] MA X L, ZHANG Q F, TIAN G D, et al. On tchebycheff decomposition approaches for multiobjective evolutionary optimization[J]. IEEE Transactions on Evolutionary Computation, 2018, 22(2): 226-244.
[13] ZHANG Q F, LI H. MOEA/D: a multiobjective evolutionary algorithm based on decomposition[J]. IEEE Transactions on Evolutionary Computation, 2007, 11(6): 712-731.
[14] ANGEL E, BAMPIS E, GOURVèS L. Approximating the Pareto curve with local search for the bicriteria TSP(1,2) problem[J]. Theoretical Computer Science, 2004, 310(1/2/3): 135-146.
[15] ANGEL E, BAMPIS E, GOURVèS L, et al. (Non)-approximability for the multi?criteria TSP (1,2)[C]//Proceedings of the 15th International Symposium on Fundamentals of Computation Theory（FCT 2005）. Berlin, Heidelberg: Springer, 2005: 329-340.
[16] DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182-197.
[17] SAMANLIOGLU F, FERRELL W G, KURZ M E. A memetic random-key genetic algorithm for a symmetric multi-objective traveling salesman problem[J]. Computers & Industrial Engineering, 2008, 55(2): 439-449.
[18] HANSEN M P. Use of substitute scalarizing functions to guide a local search based heuristic: the case of MOTSP[J]. Journal of Heuristics, 2000, 6(3): 419-431.
[19] LIU Y X, LU Y X, GAO C, et al. A multi-objective ant colony optimization algorithm based on the physarum?inspired mathematical model[C]//Proceedings of the 2014 10th International Conference on Natural Computation. Piscataway: IEEE, 2014: 303-308.
[20] GARCíA-MARTíNEZ C, CORDóN O, HERRERA F. A taxonomy and an empirical analysis of multiple objective ant colony optimization algorithms for the bi-criteria TSP[J]. European Journal of Operational Research, 2007, 180(1): 116-148.
[21] PSYCHAS I D, DELIMPASI E, MARINAKIS Y. Hybrid evolutionary algorithms for the multiobjective traveling salesman problem[J]. Expert Systems with Applications, 2015, 42(22): 8956-8970.
[22] CHEN X J, LIU Y X, LI X H, et al. A new evolutionary multiobjective model for traveling salesman problem[J]. IEEE Access, 2019, 7: 66964-66979.
[23] MA M, LI H C. A hybrid genetic algorithm for solving bi-objective traveling salesman problems[J]. Journal of Physics: Conference Series, 2017, 887: 012065.
[24] ZHANG Z L, GAO C, LU Y X, et al. Multi-objective ant colony optimization based on the physarum-inspired mathematical model for bi-objective traveling salesman problems[J]. PLoS One, 2016, 11(1): e0146709.
[25] DORIGO M, GAMBARDELLA L M. Ant colony system: a cooperative learning approach to the traveling salesman problem[J]. IEEE Transactions on Evolutionary Computation, 1997, 1(1): 53-66.
[26] MICHALAK K. The MOEA/D algorithm with Gaussian neighbourhoods for the multiobjective travelling salesman problem[C]//Proceedings of the Genetic and Evolutionary Computation Conference. New York: ACM, 2017: 155-156.
[27] CAI X Y, XIA C, ZHANG Q F, et al. The collaborative local search based on dynamic?constrained decomposition with grids for combinatorial multiobjective optimization[J]. IEEE Transactions on Cybernetics, 2021, 51(5): 2639-2650.
[28] DUBOIS-LACOSTE J, LóPEZ-IBá?EZ M, STüTZLE T. Anytime Pareto local search[J]. European Journal of Operational Research, 2015, 243(2): 369-385.
[29] KE L J, ZHANG Q F, BATTITI R. Hybridization of decomposition and local search for multiobjective optimization[J]. IEEE Transactions on Cybernetics, 2014, 44(10): 1808-1820.
[30] CAI X Y, WANG K, MEI Y, et al. Decomposition-based Lin-kernighan heuristic with neighborhood structure transfer for multi/many-objective traveling salesman problem[J]. IEEE Transactions on Evolutionary Computation, 2023, 27(6): 1604-1617.
[31] LUST T, TEGHEM J. Two-phase Pareto local search for the biobjective traveling salesman problem[J]. Journal of Heuristics, 2010, 16(3): 475-510.
[32] SHI J L, ZHANG Q F, SUN J Y. PPLS/D: parallel Pareto local search based on decomposition[J]. IEEE Transactions on Cybernetics, 2020, 50(3): 1060-1071.
[33] CAI X Y, SUN H R, ZHANG Q F, et al. A grid weighted sum Pareto local search for combinatorial multi and many-objective optimization[J]. IEEE Transactions on Cybernetics, 2019, 49(9): 3586-3598.
[34] MORAES D H, SANCHES D S, DA SILVA?ROCHA J, et al. A novel multi-objective evolutionary algorithm based on subpopulations for the bi?objective traveling salesman problem[J]. Soft Computing, 2019, 23(15): 6157-6168.
[35] ZITZLER E, LAUMANNS M, THIELE L. SPEA2: improving the strength Pareto evolutionary algorithm: TIK Report 103[R]. Zürich: Eidgen?ssische Technische Hochschule Zürich (ETH), Institut für Technische Informatik Kommunikationsnetze (TIK), 2001.
[36] 孟昊炜. 基于深度强化学习的高通量卫星频率资源调度算法研究[D]. 西安: 西安电子科技大学, 2023.
MENG H W. Research on high throughput satellite frequency resource scheduling algorithm based on deep reinforcement learning[D]. Xi’an: Xidian University, 2023.
[37] GRONDMAN I, BUSONIU L, LOPES G A D, et al. A survey of actor-critic reinforcement learning: standard and natural policy gradients[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, 42(6): 1291-1307.
[38] VINYALS O, FORTUNTO M, JAITLY N. Pointer networks[C]//Advances in Neural Information Processing Systems, 2015: 2692-2700.
[39] VASWANI A. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008.
[40] WU H, WANG J H, ZHANG Z Z. MODRL/D-AM: multiobjective deep reinforcement learning algorithm using decomposition and attention model for multiobjective optimization[C]//Proceedings of the 11th International Symposium on Artificial Intelligence Algorithms and Applications. Singapore: Springer Singapore, 2020: 575-589.
[41] LI K W, ZHANG T, WANG R. Deep reinforcement learning for multiobjective optimization[J]. IEEE Transactions on Cybernetics, 2021, 51(6): 3103-3114.
[42] 顾一凡. 基于强化学习的多目标组合优化算法的研究[D]. 南京: 南京航空航天大学, 2021.
GU Y F. Research on reinforcement-learning-based multi-objective combinatorial optimization[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2021.
[43] ISHIBUCHI H, SAKANE Y, TSUKAMOTO N, et al. Adaptation of scalarizing functions in MOEA/D: an adaptive scalarizing function?based multiobjective evolutionary algorithm[C]//Proceedings of the 5th International Conference on Evolutionary Multi-Criterion Optimization. Berlin, Heidelberg: Springer, 2009: 438-452.
[44] ISHIBUCHI H, DOI K, NOJIMA Y. Use of piecewise linear and nonlinear scalarizing functions in MOEA/D[C]//Proceedings of the International Conference on Parallel Problem Solving from Nature. Cham: Springer International Publishing, 2016: 503-513.
[45] GAO L, WANG R, LIU C, et al. Multi-objective pointer network for combinatorial optimization[J]. arXiv:2204.11860, 2022.
[46] DEB K, JAIN H. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints[J]. IEEE Transactions on Evolutionary Computation, 2014, 18(4): 577-601.
[47] LIN X, YANG Z, ZHANG Q. Pareto set learning for neural multi-objective combinatorial optimization[J]. arXiv:2203. 15386, 2022.
[48] KOOL W, VAN HOOF H, GROMICHO J, et al. Deep policy dynamic programming forvehicle routing problems[C]//Proceedings of the International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Cham: Springer International Publishing, 2022: 190-213.
[49] YE T, ZHANG Z Z, CHEN J B, et al. Weight-specific-decoder attention model to solve multiobjective combinatorial optimization problems[C]//Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE, 2022: 2839-2844.
[50] ZHANG Z Z, WU Z Y, ZHANG H, et al. Meta-learning-based deep reinforcement learning for multiobjective optimization problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 7978-7991.
[51] CHEN J B, WANG J H, ZHANG Z Z, et al. Efficient meta neural heuristic for multi-objective combinatorial optimization[J]. arXiv:2310.15196, 2023.
[52] SHI J L, SUN J Y, ZHANG Q F, et al. Improving Pareto local search using cooperative parallelism strategies for multiobjective combinatorial optimization[J]. IEEE Transactions on Cybernetics, 2024, 54(4): 2369-2382.
[53] LI S C, WANG F, HE Q, et al. Deep reinforcement learning for multi-objective combinatorial optimization: a case study on multi-objective traveling salesman problem[J]. Swarm and Evolutionary Computation, 2023, 83: 101398.
[54] FARIAS L R C, ARAúJOL A F R. Many-objective evolutionary algorithm based on decomposition with random and adaptive weights[C]//Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics. Piscataway: IEEE, 2019: 3746-3751.
[55] YUAN J W, LIU H L, GU F Q, et al. Investigating the properties of indicators and an evolutionary many?objective algorithm using promising regions[J]. IEEE Transactions on Evolutionary Computation, 2021, 25(1): 75-86.
[56] SHAO Y N, LIN J C, SRIVASTAVA G, et al. Multi-objective neural evolutionary algorithm for combinatorial optimization problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 2133-2143.
[57] PERERA J, LIU S H, MERNIK M, et al. A graph pointer network-based multi-objective deep reinforcement learning algorithm for solving the traveling salesman problem[J]. Mathematics, 2023, 11(2): 437.
[58] CHEN J, ZHANG Z, CAO Z, et al. Neural multi-objective combinatorial optimization with diversity enhancement[J]. arXiv:2310.15195, 2023.
[59] JIA D B, CAO M, HU W B, et al. Multi-objective combinatorial optimization algorithm based on asynchronous advantage actor?critic and graph transformer networks[J]. Electronics, 2024, 13(19): 3842.
[60] ZHENG Z, YAO S Y, LI G H, et al. Pareto improver: learning improvement heuristics for multi-objective route planning[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(1): 1033-1043.
[61] WANG Z K, YAO S Y, LI G H, et al. Multiobjective combinatorial optimization using a single deep reinforcement learning model[J]. IEEE Transactions on Cybernetics, 2024, 54(3): 1984-1996.
[62] LIU W, WANG R, ZHANG T, et al. Hybridization of evolutionary algorithm and deep reinforcement learning for multiobjective orienteering optimization[J]. IEEE Transactions on Evolutionary Computation, 2023, 27(5): 1260-1274.
[63] ZHAO S J, GU S S. A deep reinforcement learning algorithm framework for solving multi-objective traveling salesman problem based on feature transformation[J]. Neural Networks, 2024, 176: 106359.
[64] 黄俊杰. 基于深度强化学习和进化算法的多目标优化算法设计[D]. 广州: 华南理工大学, 2021.
HUANG J J. Design of multi-objective optimization algorithm based on deep reinforcement learning and evolutionary algorithm[D]. Guangzhou: South China University of Technology, 2021.
[65] TIAN Y, ZHU Q H, SHAO S, et al. A deep reinforcement learning assisted heuristic for solving traveling salesman problems[C]//Proceedings of the 2024 IEEE Congress on Evolutionary Computation. Piscataway: IEEE, 2024: 1-8.
[66] KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![J]. arXiv:1803.08475, 2018.
[67] WU Y X, SONG W, CAO Z G, et al. Learning improvement heuristics for solving routing problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(9): 5057-5069.
[68] HUDSON B, LI Q B, MALENCIA M, et al. Graph neural network guided local search for the traveling salesperson problem[J]. arXiv:2110.05291, 2021.
[69] BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[J]. arXiv:1611. 09940, 2016.
[70] MA Q, GE S W, HE D Y, et al. Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning[J]. arXiv:1911.04936, 2019.
[71] ZHANG Y X, WANG J H, ZHANG Z Z, et al. MODRL/DEL: multiobjective deep reinforcement learning with evolutionary learning for multiobjective optimization[C]//Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8.
[72] LADOSZ P, WENG L L, KIM M, et al. Exploration in deep reinforcement learning: a survey[J]. Information Fusion, 2022, 85: 1-22.
[73] ZHANG W Y, CHEN Q, YAN J Y, et al. A novel asynchronous deep reinforcement learning model with adaptive early forecasting method and reward incentive mechanism for short-term load forecasting[J]. Energy, 2021, 236: 121492.
[74] HERNANDEZ-LEAL P, KARTAL B, TAYLOR M E. A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents and Multi-Agent Systems, 2019, 33(6): 750-797.
[75] IQBAL S, SHA F. Actor-attention-critic for multi-agent reinforcement learning[C]//Proceedings of the International Conference on Machine Learning, 2019: 2961-2970.
[76] ABID M S, APON H J, HOSSAIN S, et al. A novel multi-objective optimization based multi?agent deep reinforcement learning approach for microgrid resources planning[J]. Applied Energy, 2024, 353: 122029.
[77] HE Z L, TRAN K P, THOMASSEY S, et al. Multi-objective optimization of the textile manufacturing process using deep-Q-network based multi-agent reinforcement learning[J]. Journal of Manufacturing Systems, 2022, 62: 939-949.
[78] YANG L, SATHISHKUMAR V V E, MANICKAM A. Information retrieval and optimization in distribution and logistics management using deep reinforcement learning[J]. International Journal of Information Systems and Supply Chain Management, 2023, 16(1): 1-19.
[79] YUAN Y M, LI H Y, JI L L. Retracted application of deep reinforcement learning algorithm in uncertain logistics transportation scheduling[J]. Computational Intelligence and Neuroscience, 2021, 2021(1): 5672227.
[80] WANG N, LI Z, LIANG X L, et al. A review of deep reinforcement learning methods and military application research[J]. Mathematical Problems in Engineering, 2023, 2023(1): 7678382.
[81] DIMITRIU A, MICHALETZKY T V, REMELI V, et al. A reinforcement learning approach to military simulations in command: modern operations[J]. IEEE Access, 2024, 12: 77501-77513.
[82] HU Z J, GAO X G, WAN K F, et al. Relevant experience learning: a deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments[J]. Chinese Journal of Aeronautics, 2021, 34(12): 187-204.
[83] WANG H Z, WU Y L, MIN G Y, et al. Data-driven dynamic resource scheduling for network slicing: a deep reinforcement learning approach[J]. Information Sciences, 2019, 498: 106-116.
[84] ZHI Y, TIAN J, DENG X F, et al. Deep reinforcement learning-based resource allocation for D2D communications in heterogeneous cellular networks[J]. Digital Communications and Networks, 2022, 8(5): 834-842.
[85] HUANG H Z, TAN X. Application of reinforcement learning algorithm in delivery order system under supply chain environment[J]. Mobile Information Systems, 2021, 2021(1): 5880795.
[86] CHEN H L, CHEN Z Y, LIN F T, et al. Effective management for blockchain-based agri-food supply chains using deep reinforcement learning[J]. IEEE Access, 2021, 9: 36008-36018.