计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (12): 28-44.DOI: 10.3778/j.issn.1002-8331.2410-0274

• 热点与综述 • 上一篇    下一篇

深度强化学习求解多目标旅行商问题的研究综述

李成健,宋姝谊,粟宇,陈智斌   

  1. 昆明理工大学 理学院,昆明 650500
  • 出版日期:2025-06-15 发布日期:2025-06-13

Review of Multi-Objective Traveling Salesman Problem Based on Deep Reinforcement Learning

LI Chengjian, SONG Shuyi, SU Yu, CHEN Zhibin   

  1. Faculty of Science, Kunming University of Science and Technology, Kunming 650500, China
  • Online:2025-06-15 Published:2025-06-13

摘要: 多目标旅行商问题(MOTSP)是一个具有显著应用价值的组合优化问题(COP),在物流配送、生产调度和网络通信等领域广泛存在。MOTSP不仅需要在多个目标之间寻求平衡,还要求找到不同的帕累托解集,这些解集代表了MOTSP在不同目标之间的全局或局部最优解。传统的多目标优化算法在解决MOTSP时,通常面临计算复杂度高和求解效率低的挑战,尤其是在均衡决策空间和目标空间多样性时,难以有效找到多样化的帕累托最优解。近年来,深度强化学习(DRL)在处理复杂优化问题上展现了巨大的潜力,为解决MOTSP及其帕累托解集的多样化问题提供了一种新的方法。介绍了MOTSP的基本概念和求解方法;讨论了强化学习(RL)中的优化策略和深度学习(DL)的神经网络模型;总结了利用DRL求解MOTSP的理论方法,分析了各代表性模型的优化效果与时效性,突出不同DRL模型在大规模MOTSP问题中的性能表现,并探讨了其局限性、改进方向和适用场景,同时提出了应对局部最优问题的策略。最后,归纳了MOTSP的四大应用研究领域,并指出了MOTSP的未来研究方向。

关键词: 深度强化学习(DRL), 多目标旅行商问题(MOTSP), 帕累托最优解, 优化策略, 神经网络模型

Abstract: The multi-objective traveling salesman problem (MOTSP) is a combinatorial optimization problem (COP) with significant practical applications, widely existing in fields such as logistics distribution, production scheduling and network communication. MOTSP not only needs to seek balance between multiple objectives, but also requires finding different Pareto solution sets that represent the global or local optimal solutions of MOTSP between different objectives. Traditional multi-objective optimization algorithms often face challenges of high computational complexity and low solving efficiency when solving MOTSP, especially when balancing the diversity of decision space and objective space. It is difficult to effectively find diverse Pareto optimal solutions. In recent years, deep reinforcement learning (DRL) has shown great potential in dealing with complex optimization problems, providing a new approach to solving the diversity problem of MOTSP and its Pareto solution set. First of all, the basic concepts and solving methods of MOTSP are introduced. Secondly, optimization strategies in reinforcement learning (RL) and neural network models for deep learning (DL) are discussed. Then, the theoretical methods of using DRL to solve MOTSP are summarized, and the optimization effects and timeliness of various representative models are analyzed. The performance of different DRL models in large-scale MOTSP problems is highlighted, and their limitations, improvement directions, and applicable scenarios are discussed. At the same time, strategies for addressing local optimal problems are proposed. Finally, the four major application research areas of MOTSP are summarized, and the future research directions of MOTSP are pointed out.

Key words: deep reinforcement learning (DRL), multi-objective traveling salesman problem (MOTSP), Pareto optimal solution, optimization strategy, neural network model