
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (12): 28-44.DOI: 10.3778/j.issn.1002-8331.2410-0274
李成健,宋姝谊,粟宇,陈智斌
出版日期:2025-06-15
发布日期:2025-06-13
LI Chengjian, SONG Shuyi, SU Yu, CHEN Zhibin
Online:2025-06-15
Published:2025-06-13
摘要: 多目标旅行商问题(MOTSP)是一个具有显著应用价值的组合优化问题(COP),在物流配送、生产调度和网络通信等领域广泛存在。MOTSP不仅需要在多个目标之间寻求平衡,还要求找到不同的帕累托解集,这些解集代表了MOTSP在不同目标之间的全局或局部最优解。传统的多目标优化算法在解决MOTSP时,通常面临计算复杂度高和求解效率低的挑战,尤其是在均衡决策空间和目标空间多样性时,难以有效找到多样化的帕累托最优解。近年来,深度强化学习(DRL)在处理复杂优化问题上展现了巨大的潜力,为解决MOTSP及其帕累托解集的多样化问题提供了一种新的方法。介绍了MOTSP的基本概念和求解方法;讨论了强化学习(RL)中的优化策略和深度学习(DL)的神经网络模型;总结了利用DRL求解MOTSP的理论方法,分析了各代表性模型的优化效果与时效性,突出不同DRL模型在大规模MOTSP问题中的性能表现,并探讨了其局限性、改进方向和适用场景,同时提出了应对局部最优问题的策略。最后,归纳了MOTSP的四大应用研究领域,并指出了MOTSP的未来研究方向。
李成健, 宋姝谊, 粟宇, 陈智斌. 深度强化学习求解多目标旅行商问题的研究综述[J]. 计算机工程与应用, 2025, 61(12): 28-44.
LI Chengjian, SONG Shuyi, SU Yu, CHEN Zhibin. Review of Multi-Objective Traveling Salesman Problem Based on Deep Reinforcement Learning[J]. Computer Engineering and Applications, 2025, 61(12): 28-44.
| [1] RABINER L. Combinatorial optimization: algorithms and complexity[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984, 32(6): 1258-1259. [2] JASZKIEWICZ A. Genetic local search for multi-objective combinatorial optimization[J]. European Journal of Operational Research, 2002, 137(1): 50-71. [3] KE L J, ZHANG Q F, BATTITI R. MOEA/D-ACO: a multiobjective evolutionary algorithm using decomposition and AntColony[J]. IEEE Transactions on Cybernetics, 2013, 43(6): 1845-1859. [4] ZHOU A M, GAO F, ZHANG G X. A decomposition based estimation of distribution algorithm for multiobjective traveling salesman problems[J]. Computers & Mathematics with Applications, 2013, 66(10): 1857-1868. [5] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529: 484-489. [6] 唐振韬, 邵坤, 赵冬斌, 等. 深度强化学习进展: 从AlphaGo到AlphaGo Zero[J]. 控制理论与应用, 2017, 34(12): 1529-1546. TANG Z T, SHAO K, ZHAO D B, et al. Recent progress of deep reinforcement learning: from AlphaGo to AlphaGo Zero[J]. Control Theory & Applications, 2017, 34(12): 1529-1546. [7] GENG J Y, YING S, JIA X Y, et al. Supporting many-objective software requirements decision: an exploratory study on the next release problem[J]. IEEE Access, 2018, 6: 60547-60558. [8] METIAF A, WU Q H, ALJEROUDI Y. Searching with direction awareness: multi-objective genetic algorithm based on angle quantization and crowding distance MOGA-AQCD[J]. IEEE Access, 2019, 7: 10196-10207. [9] LI K, DEB K, ZHANG Q F, et al. An evolutionary many-objective optimization algorithm based on dominance and decomposition[J]. IEEE Transactions on Evolutionary Computation, 2015, 19(5): 694-716. [10] 李凯文. 基于深度强化学习的组合优化方法研究[D]. 长沙: 国防科技大学, 2022. LI K W. Research on combinatorial optimization method based on deep reinforcement learning[D]. Changsha: National University of Defense Technology, 2022. [11] MIETTINEN K. Nonlinear multiobjective optimization[M]. [S.l.]:Springer Science & Business Media, 1999. [12] MA X L, ZHANG Q F, TIAN G D, et al. On tchebycheff decomposition approaches for multiobjective evolutionary optimization[J]. IEEE Transactions on Evolutionary Computation, 2018, 22(2): 226-244. [13] ZHANG Q F, LI H. MOEA/D: a multiobjective evolutionary algorithm based on decomposition[J]. IEEE Transactions on Evolutionary Computation, 2007, 11(6): 712-731. [14] ANGEL E, BAMPIS E, GOURVèS L. Approximating the Pareto curve with local search for the bicriteria TSP(1,2) problem[J]. Theoretical Computer Science, 2004, 310(1/2/3): 135-146. [15] ANGEL E, BAMPIS E, GOURVèS L, et al. (Non)-approximability for the multi?criteria TSP (1,2)[C]//Proceedings of the 15th International Symposium on Fundamentals of Computation Theory(FCT 2005). Berlin, Heidelberg: Springer, 2005: 329-340. [16] DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182-197. [17] SAMANLIOGLU F, FERRELL W G, KURZ M E. A memetic random-key genetic algorithm for a symmetric multi-objective traveling salesman problem[J]. Computers & Industrial Engineering, 2008, 55(2): 439-449. [18] HANSEN M P. Use of substitute scalarizing functions to guide a local search based heuristic: the case of MOTSP[J]. Journal of Heuristics, 2000, 6(3): 419-431. [19] LIU Y X, LU Y X, GAO C, et al. A multi-objective ant colony optimization algorithm based on the physarum?inspired mathematical model[C]//Proceedings of the 2014 10th International Conference on Natural Computation. Piscataway: IEEE, 2014: 303-308. [20] GARCíA-MARTíNEZ C, CORDóN O, HERRERA F. A taxonomy and an empirical analysis of multiple objective ant colony optimization algorithms for the bi-criteria TSP[J]. European Journal of Operational Research, 2007, 180(1): 116-148. [21] PSYCHAS I D, DELIMPASI E, MARINAKIS Y. Hybrid evolutionary algorithms for the multiobjective traveling salesman problem[J]. Expert Systems with Applications, 2015, 42(22): 8956-8970. [22] CHEN X J, LIU Y X, LI X H, et al. A new evolutionary multiobjective model for traveling salesman problem[J]. IEEE Access, 2019, 7: 66964-66979. [23] MA M, LI H C. A hybrid genetic algorithm for solving bi-objective traveling salesman problems[J]. Journal of Physics: Conference Series, 2017, 887: 012065. [24] ZHANG Z L, GAO C, LU Y X, et al. Multi-objective ant colony optimization based on the physarum-inspired mathematical model for bi-objective traveling salesman problems[J]. PLoS One, 2016, 11(1): e0146709. [25] DORIGO M, GAMBARDELLA L M. Ant colony system: a cooperative learning approach to the traveling salesman problem[J]. IEEE Transactions on Evolutionary Computation, 1997, 1(1): 53-66. [26] MICHALAK K. The MOEA/D algorithm with Gaussian neighbourhoods for the multiobjective travelling salesman problem[C]//Proceedings of the Genetic and Evolutionary Computation Conference. New York: ACM, 2017: 155-156. [27] CAI X Y, XIA C, ZHANG Q F, et al. The collaborative local search based on dynamic?constrained decomposition with grids for combinatorial multiobjective optimization[J]. IEEE Transactions on Cybernetics, 2021, 51(5): 2639-2650. [28] DUBOIS-LACOSTE J, LóPEZ-IBá?EZ M, STüTZLE T. Anytime Pareto local search[J]. European Journal of Operational Research, 2015, 243(2): 369-385. [29] KE L J, ZHANG Q F, BATTITI R. Hybridization of decomposition and local search for multiobjective optimization[J]. IEEE Transactions on Cybernetics, 2014, 44(10): 1808-1820. [30] CAI X Y, WANG K, MEI Y, et al. Decomposition-based Lin-kernighan heuristic with neighborhood structure transfer for multi/many-objective traveling salesman problem[J]. IEEE Transactions on Evolutionary Computation, 2023, 27(6): 1604-1617. [31] LUST T, TEGHEM J. Two-phase Pareto local search for the biobjective traveling salesman problem[J]. Journal of Heuristics, 2010, 16(3): 475-510. [32] SHI J L, ZHANG Q F, SUN J Y. PPLS/D: parallel Pareto local search based on decomposition[J]. IEEE Transactions on Cybernetics, 2020, 50(3): 1060-1071. [33] CAI X Y, SUN H R, ZHANG Q F, et al. A grid weighted sum Pareto local search for combinatorial multi and many-objective optimization[J]. IEEE Transactions on Cybernetics, 2019, 49(9): 3586-3598. [34] MORAES D H, SANCHES D S, DA SILVA?ROCHA J, et al. A novel multi-objective evolutionary algorithm based on subpopulations for the bi?objective traveling salesman problem[J]. Soft Computing, 2019, 23(15): 6157-6168. [35] ZITZLER E, LAUMANNS M, THIELE L. SPEA2: improving the strength Pareto evolutionary algorithm: TIK Report 103[R]. Zürich: Eidgen?ssische Technische Hochschule Zürich (ETH), Institut für Technische Informatik Kommunikationsnetze (TIK), 2001. [36] 孟昊炜. 基于深度强化学习的高通量卫星频率资源调度算法研究[D]. 西安: 西安电子科技大学, 2023. MENG H W. Research on high throughput satellite frequency resource scheduling algorithm based on deep reinforcement learning[D]. Xi’an: Xidian University, 2023. [37] GRONDMAN I, BUSONIU L, LOPES G A D, et al. A survey of actor-critic reinforcement learning: standard and natural policy gradients[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, 42(6): 1291-1307. [38] VINYALS O, FORTUNTO M, JAITLY N. Pointer networks[C]//Advances in Neural Information Processing Systems, 2015: 2692-2700. [39] VASWANI A. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008. [40] WU H, WANG J H, ZHANG Z Z. MODRL/D-AM: multiobjective deep reinforcement learning algorithm using decomposition and attention model for multiobjective optimization[C]//Proceedings of the 11th International Symposium on Artificial Intelligence Algorithms and Applications. Singapore: Springer Singapore, 2020: 575-589. [41] LI K W, ZHANG T, WANG R. Deep reinforcement learning for multiobjective optimization[J]. IEEE Transactions on Cybernetics, 2021, 51(6): 3103-3114. [42] 顾一凡. 基于强化学习的多目标组合优化算法的研究[D]. 南京: 南京航空航天大学, 2021. GU Y F. Research on reinforcement-learning-based multi-objective combinatorial optimization[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2021. [43] ISHIBUCHI H, SAKANE Y, TSUKAMOTO N, et al. Adaptation of scalarizing functions in MOEA/D: an adaptive scalarizing function?based multiobjective evolutionary algorithm[C]//Proceedings of the 5th International Conference on Evolutionary Multi-Criterion Optimization. Berlin, Heidelberg: Springer, 2009: 438-452. [44] ISHIBUCHI H, DOI K, NOJIMA Y. Use of piecewise linear and nonlinear scalarizing functions in MOEA/D[C]//Proceedings of the International Conference on Parallel Problem Solving from Nature. Cham: Springer International Publishing, 2016: 503-513. [45] GAO L, WANG R, LIU C, et al. Multi-objective pointer network for combinatorial optimization[J]. arXiv:2204.11860, 2022. [46] DEB K, JAIN H. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints[J]. IEEE Transactions on Evolutionary Computation, 2014, 18(4): 577-601. [47] LIN X, YANG Z, ZHANG Q. Pareto set learning for neural multi-objective combinatorial optimization[J]. arXiv:2203. 15386, 2022. [48] KOOL W, VAN HOOF H, GROMICHO J, et al. Deep policy dynamic programming forvehicle routing problems[C]//Proceedings of the International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Cham: Springer International Publishing, 2022: 190-213. [49] YE T, ZHANG Z Z, CHEN J B, et al. Weight-specific-decoder attention model to solve multiobjective combinatorial optimization problems[C]//Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE, 2022: 2839-2844. [50] ZHANG Z Z, WU Z Y, ZHANG H, et al. Meta-learning-based deep reinforcement learning for multiobjective optimization problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 7978-7991. [51] CHEN J B, WANG J H, ZHANG Z Z, et al. Efficient meta neural heuristic for multi-objective combinatorial optimization[J]. arXiv:2310.15196, 2023. [52] SHI J L, SUN J Y, ZHANG Q F, et al. Improving Pareto local search using cooperative parallelism strategies for multiobjective combinatorial optimization[J]. IEEE Transactions on Cybernetics, 2024, 54(4): 2369-2382. [53] LI S C, WANG F, HE Q, et al. Deep reinforcement learning for multi-objective combinatorial optimization: a case study on multi-objective traveling salesman problem[J]. Swarm and Evolutionary Computation, 2023, 83: 101398. [54] FARIAS L R C, ARAúJOL A F R. Many-objective evolutionary algorithm based on decomposition with random and adaptive weights[C]//Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics. Piscataway: IEEE, 2019: 3746-3751. [55] YUAN J W, LIU H L, GU F Q, et al. Investigating the properties of indicators and an evolutionary many?objective algorithm using promising regions[J]. IEEE Transactions on Evolutionary Computation, 2021, 25(1): 75-86. [56] SHAO Y N, LIN J C, SRIVASTAVA G, et al. Multi-objective neural evolutionary algorithm for combinatorial optimization problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 2133-2143. [57] PERERA J, LIU S H, MERNIK M, et al. A graph pointer network-based multi-objective deep reinforcement learning algorithm for solving the traveling salesman problem[J]. Mathematics, 2023, 11(2): 437. [58] CHEN J, ZHANG Z, CAO Z, et al. Neural multi-objective combinatorial optimization with diversity enhancement[J]. arXiv:2310.15195, 2023. [59] JIA D B, CAO M, HU W B, et al. Multi-objective combinatorial optimization algorithm based on asynchronous advantage actor?critic and graph transformer networks[J]. Electronics, 2024, 13(19): 3842. [60] ZHENG Z, YAO S Y, LI G H, et al. Pareto improver: learning improvement heuristics for multi-objective route planning[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(1): 1033-1043. [61] WANG Z K, YAO S Y, LI G H, et al. Multiobjective combinatorial optimization using a single deep reinforcement learning model[J]. IEEE Transactions on Cybernetics, 2024, 54(3): 1984-1996. [62] LIU W, WANG R, ZHANG T, et al. Hybridization of evolutionary algorithm and deep reinforcement learning for multiobjective orienteering optimization[J]. IEEE Transactions on Evolutionary Computation, 2023, 27(5): 1260-1274. [63] ZHAO S J, GU S S. A deep reinforcement learning algorithm framework for solving multi-objective traveling salesman problem based on feature transformation[J]. Neural Networks, 2024, 176: 106359. [64] 黄俊杰. 基于深度强化学习和进化算法的多目标优化算法设计[D]. 广州: 华南理工大学, 2021. HUANG J J. Design of multi-objective optimization algorithm based on deep reinforcement learning and evolutionary algorithm[D]. Guangzhou: South China University of Technology, 2021. [65] TIAN Y, ZHU Q H, SHAO S, et al. A deep reinforcement learning assisted heuristic for solving traveling salesman problems[C]//Proceedings of the 2024 IEEE Congress on Evolutionary Computation. Piscataway: IEEE, 2024: 1-8. [66] KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![J]. arXiv:1803.08475, 2018. [67] WU Y X, SONG W, CAO Z G, et al. Learning improvement heuristics for solving routing problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(9): 5057-5069. [68] HUDSON B, LI Q B, MALENCIA M, et al. Graph neural network guided local search for the traveling salesperson problem[J]. arXiv:2110.05291, 2021. [69] BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[J]. arXiv:1611. 09940, 2016. [70] MA Q, GE S W, HE D Y, et al. Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning[J]. arXiv:1911.04936, 2019. [71] ZHANG Y X, WANG J H, ZHANG Z Z, et al. MODRL/DEL: multiobjective deep reinforcement learning with evolutionary learning for multiobjective optimization[C]//Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8. [72] LADOSZ P, WENG L L, KIM M, et al. Exploration in deep reinforcement learning: a survey[J]. Information Fusion, 2022, 85: 1-22. [73] ZHANG W Y, CHEN Q, YAN J Y, et al. A novel asynchronous deep reinforcement learning model with adaptive early forecasting method and reward incentive mechanism for short-term load forecasting[J]. Energy, 2021, 236: 121492. [74] HERNANDEZ-LEAL P, KARTAL B, TAYLOR M E. A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents and Multi-Agent Systems, 2019, 33(6): 750-797. [75] IQBAL S, SHA F. Actor-attention-critic for multi-agent reinforcement learning[C]//Proceedings of the International Conference on Machine Learning, 2019: 2961-2970. [76] ABID M S, APON H J, HOSSAIN S, et al. A novel multi-objective optimization based multi?agent deep reinforcement learning approach for microgrid resources planning[J]. Applied Energy, 2024, 353: 122029. [77] HE Z L, TRAN K P, THOMASSEY S, et al. Multi-objective optimization of the textile manufacturing process using deep-Q-network based multi-agent reinforcement learning[J]. Journal of Manufacturing Systems, 2022, 62: 939-949. [78] YANG L, SATHISHKUMAR V V E, MANICKAM A. Information retrieval and optimization in distribution and logistics management using deep reinforcement learning[J]. International Journal of Information Systems and Supply Chain Management, 2023, 16(1): 1-19. [79] YUAN Y M, LI H Y, JI L L. Retracted application of deep reinforcement learning algorithm in uncertain logistics transportation scheduling[J]. Computational Intelligence and Neuroscience, 2021, 2021(1): 5672227. [80] WANG N, LI Z, LIANG X L, et al. A review of deep reinforcement learning methods and military application research[J]. Mathematical Problems in Engineering, 2023, 2023(1): 7678382. [81] DIMITRIU A, MICHALETZKY T V, REMELI V, et al. A reinforcement learning approach to military simulations in command: modern operations[J]. IEEE Access, 2024, 12: 77501-77513. [82] HU Z J, GAO X G, WAN K F, et al. Relevant experience learning: a deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments[J]. Chinese Journal of Aeronautics, 2021, 34(12): 187-204. [83] WANG H Z, WU Y L, MIN G Y, et al. Data-driven dynamic resource scheduling for network slicing: a deep reinforcement learning approach[J]. Information Sciences, 2019, 498: 106-116. [84] ZHI Y, TIAN J, DENG X F, et al. Deep reinforcement learning-based resource allocation for D2D communications in heterogeneous cellular networks[J]. Digital Communications and Networks, 2022, 8(5): 834-842. [85] HUANG H Z, TAN X. Application of reinforcement learning algorithm in delivery order system under supply chain environment[J]. Mobile Information Systems, 2021, 2021(1): 5880795. [86] CHEN H L, CHEN Z Y, LIN F T, et al. Effective management for blockchain-based agri-food supply chains using deep reinforcement learning[J]. IEEE Access, 2021, 9: 36008-36018. |
| [1] | 李晖, 卢凯, 韩子傲, 鞠明媚, 刘述娟, 杜左强. NISQ设备的量子电路调度策略优化研究[J]. 计算机工程与应用, 2024, 60(22): 105-113. |
| [2] | 李兴洲, 李艳武, 谢辉. 基于CNN的深度强化学习算法求解柔性作业车间调度问题[J]. 计算机工程与应用, 2024, 60(17): 312-320. |
| [3] | 高帅, 奚雪峰, 郑倩, 崔志明, 盛胜利. 面向数据可视化的自然语言接口研究综述[J]. 计算机工程与应用, 2024, 60(15): 24-41. |
| [4] | 李晖, 韩子傲, 卢凯, 刘述娟, 鞠明媚. 改进量子位初始映射的综合SWAP优化策略[J]. 计算机工程与应用, 2024, 60(14): 66-73. |
| [5] | 张波,徐黎明,黄志伟,要小鹏. 梯度策略的多目标GANs帕累托最优解算法[J]. 计算机工程与应用, 2021, 57(9): 89-95. |
| [6] | 何雅颖,范昕炜. 改进蚁群算法在机器人路径规划中的应用[J]. 计算机工程与应用, 2021, 57(16): 276-282. |
| [7] | 王秀芬. 窄通道路径规划的改进人工势场蚁群算法[J]. 计算机工程与应用, 2019, 55(3): 104-107. |
| [8] | 张小恒1,2,李勇明2,朱 斌2. 范数正则化解相关集成学习基音频率检测[J]. 计算机工程与应用, 2017, 53(11): 155-160. |
| [9] | 王 琦,崔 巍,魏 秦,黄茹雪. 均衡型小世界优化策略[J]. 计算机工程与应用, 2016, 52(17): 36-40. |
| [10] | 秦兴生,胡觉亮,丁佐华. 基于神经网络集成的软件故障预测及实验分析[J]. 计算机工程与应用, 2014, 50(10): 44-47. |
| [11] | 燕红文. 基于Snort的改进BMH单模式匹配算法研究[J]. 计算机工程与应用, 2012, 48(31): 78-81. |
| [12] | 李国成1,2,3,吴 涛1,2,徐 沈1,2. 灰色人工神经网络人口总量预测模型及应用[J]. 计算机工程与应用, 2009, 45(16): 215-218. |
| [13] | 林祝亮1,2. 无线传感网络覆盖的多步长粒子群优化研究[J]. 计算机工程与应用, 2009, 45(13): 87-89. |
| [14] | 郭喜平,蒙应杰. 模糊查询中的策略优化[J]. 计算机工程与应用, 2008, 44(34): 152-154. |
| [15] | 杨诗琴,须文波,孙 俊. 基于改进的SOM网络模型的VoIP QoS应用研究[J]. 计算机工程与应用, 2008, 44(1): 107-109. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||