基于强化学习的软件定义网络流量工程研究综述

doi:10.3778/j.issn.1002-8331.2412-0248

摘要/Abstract

摘要： 软件定义网络（software-defined networking，SDN）凭借其全局化、集中式的管理架构，为复杂动态网络管理带来了革命性变化，也为实施网络流量工程创造了便利条件。与此同时，强化学习因其在决策优化方面具备显著优势而备受关注。将强化学习与SDN独特架构相结合，应用于流量工程具有重要的现实意义。从理论和应用两个层面，依据技术发展脉络，全面梳理了强化学习、深度强化学习、多智能体深度强化学习在SDN流量工程中的研究进展；从方法分类、网络场景、强化学习算法、流量工程目标等多个维度，对现有研究成果进行了归纳、整理与分析，为实施SDN流量工程方法策略提供了多维视角；进一步归纳整理了强化学习与其他技术结合的研究进展，显示出其在提升流量工程策略性能方面的潜力。在总结现有研究进展的基础上，剖析了当前面临的挑战，并提出了未来的研究方向，为促进该领域的深化探索提供一定参考。

关键词: 强化学习, 软件定义网络, 流量工程, 路由算法

Abstract: Software-defined networking（SDN）, with its global and centralized management architecture, has brought revolutionary changes to the management of complex and dynamic networks, and has also created favorable conditions for network traffic engineering. Concurrently, reinforcement learning has garnered significant attention due to its pronounced advantages in decision optimization. The integration of reinforcement learning with the unique architecture of SDN and its application to SDN traffic engineering holds substantial practical significance. Firstly, from both theoretical and practical perspectives, based on the trajectory of technological development, the paper reviews the advancements in reinforcement learning, deep reinforcement learning, and multi-agent deep reinforcement learning in SDN traffic engineering. Additionally, it conducts a thorough synthesis and analysis of existing research outcomes across various dimensions, including methodological categorization, network scenarios, reinforcement learning algorithms, and traffic engineering objectives, providing a multidimensional perspective on the integration of reinforcement learning with SDN traffic engineering. Subsequently, it further summarizes the research progress of reinforcement learning combined with other technologies, demonstrating its potential to enhance the performance of traffic engineering. Ultimately, based on a summary of the current research progress, the paper analyzes the challenges faced and proposes future research directions, providing some reference for deepening exploration in this domain.

Key words: reinforcement learning, software-defined networking, traffic engineering, routing algorithm

刘延飞, 王程锦, 李超. 基于强化学习的软件定义网络流量工程研究综述[J]. 计算机工程与应用, 2025, 61(24): 1-28.

LIU Yanfei, WANG Chengjin, LI Chao. Survey on Traffic Engineering in Software-Defined Networking Based on Reinforcement Learning[J]. Computer Engineering and Applications, 2025, 61(24): 1-28.

参考文献

[1] 胡道允, 齐进, 陆钱春, 等. 基于深度学习的流量工程算法研究与应用[J]. 电信科学, 2021, 37(2): 107-114.
HU D Y, QI J, LU Q C, et al. Research and application of traffic engineering algorithm based on deep learning[J]. Telecommunications Science, 2021, 37(2): 107-114.
[2] GRAVEY A, HéBUTERNE G, MAZUMDAR R R, et al. Traffic engineering in ATM networks: current trends and future issues[J]. Sadhana, 1994, 19(6): 1005-1025.
[3] 杨华卫, 王洪波, 程时端, 等. 最小化路径代价和流量均衡模型及算法[J]. 电子与信息学报, 2010, 32(10): 2415-2420.
YANG H W, WANG H B, CHENG S D, et al. Minimizing sum of path-cost model and algorithm for traffic balancing[J]. Journal of Electronics & Information Technology, 2010, 32(10): 2415-2420.
[4] 张艳, 郑纪蛟. 基于MPLS的流量工程[J]. 计算机应用研究, 2002, 19(2): 58-59.
ZHANG Y, ZHENG J J. The traffic engineering based on MPLS[J]. Application Research of Computers, 2002, 19(2): 58-59.
[5] MCKEOWN N, ANDERSON T, BALAKRISHNAN H, et al. OpenFlow: enabling innovation in campus networks[J]. ACM SIGCOMM Computer Communication Review, 2008, 38(2): 69-74.
[6] 王素彬, 朱永庆. SDN与流量精细化运营[J]. 电信科学, 2014, 30(11): 145-153.
WANG S B, ZHU Y Q. SDN and traffic fine operation[J]. Telecommunications Science, 2014, 30(11): 145-153.
[7] 张奇. SDN在传送网中的关键技术及流量工程应用场景[J]. 电信科学, 2015, 31(S1): 158-162.
ZHANG Q. Key technologies of SDN in transport network and application scenarios of traffic engineering[J]. Telecommunications Science, 2015, 31(S1): 158-162.
[8] FORTZ B, REXFORD J, THORUP M. Traffic engineering with traditional IP routing protocols[J]. IEEE Communications Magazine, 2002, 40(10): 118-124.
[9] XIAO X P, HANNAN A, BAILEY B, et al. Traffic engineering with MPLS in the Internet[J]. IEEE Network, 2000, 14(2): 28-33.
[10] AKYILDIZ I F, LEE A, WANG P, et al. A roadmap for traffic engineering in SDN-OpenFlow networks[J]. Computer Networks, 2014, 71: 1-30.
[11] GUO Y Y, WANG Z L, YIN X, et al. Incremental deployment for traffic engineering in hybrid SDN network[C]//Proceedings of the 2015 IEEE 34th International Performance Computing and Communications Conference. Piscataway: IEEE, 2016: 1-8.
[12] 王坤, 吕光宏, 胥林, 等. 分布式软件定义网络中多域流量工程的路由优化方法[J]. 重庆大学学报, 2024, 47(7): 110-124.
WANG K, LYU G H, XU L, et al. Routing optimization method for multi-domain traffic engineering in distributed software-defined networking[J]. Journal of Chongqing University, 2024, 47(7): 110-124.
[13] 邰进, 刘辰屹, 杨芫, 等. 低成本大规模直播流量工程[J]. 清华大学学报(自然科学版), 2024, 64(3): 591-600.
TAI J, LIU C Y, YANG Y, et al. Low-cost traffic engineering for large-scale live streaming[J]. Journal of Tsinghua University (Science and Technology), 2024, 64(3): 591-600.
[14] XIE J F, YU F R, HUANG T, et al. A survey of machine learning techniques applied to software defined networking (SDN): research issues and challenges[J]. IEEE Communications Surveys & Tutorials, 2019, 21(1): 393-430.
[15] WANG M W, CUI Y, WANG X, et al. Machine learning for networking: workflow, advances and opportunities[J]. IEEE Network, 2018, 32(2): 92-99.
[16] TROIA S, SAPIENZA F, VARé L, et al. On deep reinforcement learning for traffic engineering in SD-WAN[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(7): 2198-2212.
[17] CICIO?LU M, ?ALHAN A. A multiprotocol controller deployment in SDN-based IoMT architecture[J]. IEEE Internet of Things Journal, 2022, 9(21): 20833-20840.
[18] EL-GAROUI L, PIERRE S, CHAMBERLAND S. A new SDN-based routing protocol for improving delay in smart city environments[J]. Smart Cities, 2020, 3(3): 1004-1021.
[19] WANG S P, NIE L S, LI G J, et al. A multitask learning-based network traffic prediction approach for SDN-enabled industrial Internet of Things[J]. IEEE Transactions on Ind-ustrial Informatics, 2022, 18(11): 7475-7483.
[20] ALQIAM A A, YAO Y J, WANG Z D, et al. Transferable neural WAN TE for changing topologies[C]//Proceedings of the ACM SIGCOMM 2024 Conference. New York: ACM, 2024: 86-102.
[21] GALMéS M F, PAILLISSE J, SUáREZ-VARELA J, et al. RouteNet-Fermi: network modeling with graph neural networks[J]. IEEE/ACM Transactions on Networking, 2023, 31(6): 3080-3095.
[22] YE M H, ZHANG J J, GUO Z H, et al. LARRI: learning-based adaptive range routing for highly dynamic traffic in WANs[C]//Proceedings of the IEEE INFOCOM 2023 - IEEE Conference on Computer Communications. Piscataway: IEEE, 2023: 1-10.
[23] LIU X M, ZHAO S Z, CUI Y, et al. FIGRET: fine-grained robustness-enhanced traffic engineering[C]//Proceedings of the ACM SIGCOMM 2024 Conference. New York: ACM, 2024: 117-135.
[24] QU J, MA X B, LI J F, et al. An input-agnostic hierarchical deep learning framework for traffic fingerprinting[C]//Proceedings of the 32nd USENIX Conference on Security Symposium. New York: ACM, 2023: 589-606.
[25] 刘辰屹, 徐明伟, 耿男, 等. 基于机器学习的智能路由算法综述[J]. 计算机研究与发展, 2020, 57(4): 671-687.
LIU C Y, XU M W, GENG N, et al. A survey on machine learning based routing algorithms[J]. Journal of Computer Research and Development, 2020, 57(4): 671-687.
[26] 郝学余, 吕光宏. 基于机器学习的SDN流量工程研究综述[J]. 计算机应用研究, 2022, 39(4): 961-967.
HAO X Y, LYU G H. Survey of SDN traffic engineering research based on machine learning[J]. Application Research of Computers, 2022, 39(4): 961-967.
[27] 杨洋, 吕光宏, 赵会, 等. 深度学习在软件定义网络研究中的应用综述[J]. 软件学报, 2020, 31(7): 2184-2204.
YANG Y, LYU G H, ZHAO H, et al. Survey on deep learning applicatons in software defined networking research[J]. Journal of Software, 2020, 31(7): 2184-2204.
[28] KRESIMIR J. Reinforcement learning: an introduction[J]. SIAM Review, 2021, 63(2): 423-425.
[29] 刘延飞, 李超, 王忠, 等. 多智能体深度强化学习及可扩展性研究进展[J]. 计算机工程与应用, 2025, 61(4): 1-24.
LIU Y F, LI C, WANG Z, et al. Research progress on multi-agent deep reinforcement learning and scalability[J]. Computer Engineering and Applications, 2025, 61(4): 1-24.
[30] FARAHNAKIAN F, EBRAHIMI M, DANESHTALAB M, et al. Q-Learning based congestion-aware routing algorithm for on-chip network[C]//Proceedings of the 2011 IEEE 2nd International Conference on Networked Embedded Systems for Enterprise Applications. Piscataway: IEEE, 2012: 1-7.
[31] RUMMERY G A. On-line Q-Learning using connectionist systems[J]. CTIT Technical Reports Series, 1994(1): 1-20.
[32] JIN Z J, ZANG W F, JIANG Y M, et al. A Q-Learning based business differentiating routing mechanism in SDN architecture[J]. Journal of Physics: Conference Series, 2019, 1168: 022025.
[33] NIE L S, NING Z L, OBAIDAT M S, et al. A reinforcement learning-based network traffic prediction mechanism in intelligent Internet of Things[J]. IEEE Transactions on Ind-ustrial Informatics, 2021, 17(3): 2169-2180.
[34] CASAS-VELASCO D M, RENDON O M C, DA FONSECA N L S. Intelligent routing based on reinforcement learning for software-defined networking[J]. IEEE Transactions on Network and Service Management, 2021, 18(1): 870-881.
[35] ANDREOLETTI D, VELICHKOVA T, VERTICALE G, et al. A privacy-preserving reinforcement learning algorithm for multi-domain virtual network embedding[J]. IEEE Transactions on Network and Service Management, 2020, 17(4): 2291-2304.
[36] YAJADDA S H, SAFAEI F. A novel reinforcement learning routing algorithm for congestion control in complex networks[J]. arXiv:2401.00297, 2024.
[37] SHI Y L, YANG Q L, HUANG X W, et al. An SDN-enabled framework for a load-balanced and QoS-aware Internet of underwater things[J]. IEEE Internet of Things Journal, 2023, 10(9): 7824-7834.
[38] MOREIRA C M, KADDOUM G. QL vs. SARSA: performance evaluation for intrusion prevention systems in software-defined IoT networks[C]//Proceedings of the 2023 International Wireless Communications and Mobile Computing. Piscataway: IEEE, 2023: 500-504.
[39] LI K X, WANG X W, NI Q, et al. Entropy-based reinforcement learning for computation offloading service in software-defined multi-access edge computing[J]. Future Generation Computer Systems, 2022, 136: 241-251.
[40] XIAO Y, LIU J, WU J W, et al. Leveraging deep reinforcement learning for traffic engineering: a survey[J]. IEEE Communications Surveys & Tutorials, 2021, 23(4): 2064-2097.
[41] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518: 529-533.
[42] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. New York: ACM, 2016: 2094-2100.
[43] HAUSKNECHT M, STONE P. Deep recurrent Q-Learning for partially observable MDPs[C]//Proceedings of the AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents. Arlington: AAAI, 2015: 29-37.
[44] WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York: ACM, 2016: 1995-2003.
[45] TOM S, JOHN Q, IOANNIS A, et al. Prioritized experience replay[J]. arXiv:1511.05952, 2015.
[46] KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[C]//Advances in Neural Information Processing Systems, 2000: 1008-1014.
[47] TIMOTHY P L, JONATHAN J H, ALEXANDER P, et al. Continuous control with deep reinforcement learning[C]//Proceedings of the International Conference on Learning Representations. Washington DC: ICLR, 2016.
[48] 胡子剑, 高晓光, 万开方, 等. 异策略深度强化学习中的经验回放研究综述[J]. 自动化学报, 2023, 49(11): 2237-2256.
HU Z J, GAO X G, WAN K F, et al. Research on experience replay of off-policy deep reinforcement learning: a review[J]. Acta Automatica Sinica, 2023, 49(11): 2237-2256.
[49] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. New York: ACM, 2015: 1889-1897.
[50] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[J]. arXiv:1602. 01783, 2016.
[51] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017.
[52] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[J]. arXiv:1801.01290, 2018.
[53] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing fun-ction approximation error in actor-critic methods[C]//Proceedings of the International Conference on Machine Lear-ning, 2018.
[54] 丁瑞金, 高飞飞, 邢玲. 基于深度强化学习的物联网智能路由策略[J]. 物联网学报, 2019, 3(2): 56-63.
DING R J, GAO F F, XING L. Intelligent routing strategy in the Internet of Things based on deep reinforcement lear-ning[J]. Chinese Journal on Internet of Things, 2019, 3(2): 56-63.
[55] CONG P Z, ZHANG Y C, LIU Z L, et al. A deep reinforcement learning-based multi-optimality routing scheme for dynamic IoT networks[J]. Computer Networks, 2021, 192: 108057.
[56] BOUZIDI E H, OUTTAGARTS A, LANGAR R, et al. Deep Q-network and traffic prediction based routing optimization in software defined networks[J]. Journal of Network and Computer Applications, 2021, 192: 103181.
[57] AGUIRRE SANCHEZ L P, SHEN Y, GUO M Y. DQS: a QoS-driven routing optimization approach in SDN using deep reinforcement learning[J]. Journal of Parallel and Distributed Computing, 2024, 188: 104851.
[58] CASAS-VELASCO D M, RENDON O M C, DA FONSECA N L S. DRSIR: a deep reinforcement learning approach for routing in software-defined networking[J]. IEEE Transactions on Network and Service Management, 2022, 19(4): 4807-4820.
[59] ALI R E, ERMAN B, BA?TU? E, et al. Hierarchical deep double Q-routing[C]//Proceedings of the 2020 IEEE International Conference on Communications. Piscataway: IEEE, 2020: 1-7.
[60] XIA D, WAN J F, XU P P, et al. Deep reinforcement learning-based QoS optimization for software-defined factory heterogeneous networks[J]. IEEE Transactions on Network and Service Management, 2022, 19(4): 4058-4068.
[61] HUANG L Q, YE M, XUE X S, et al. Intelligent routing method based on Dueling DQN reinforcement learning and network traffic state prediction in SDN[J]. Wireless Networks, 2024, 30(5): 4507-4525.
[62] YAO J M, YAN C G, WANG J L, et al. Stable QoE-aware multi-SFCs cooperative routing mechanism based on deep reinforcement learning[J]. IEEE Transactions on Network and Service Management, 2024, 21(1): 120-131.
[63] TANG F X, HOFNER H, KATO N, et al. A deep reinforcement learning-based dynamic traffic offloading in space-air-ground integrated networks(SAGIN)[J]. IEEE Journal on Selected Areas in Communications, 2022, 40(1): 276-289.
[64] DOKE A R, SANGEETA K. Deep reinforcement learning based load balancing policy for balancing network traffic in datacenter environment[C]//Proceedings of the 2018 Second International Conference on Green Computing and Internet of Things. Piscataway: IEEE, 2019: 1-5.
[65] WU Y W, ZHOU S P, WEI Y K, et al. Deep reinforcement learning for controller placement in software defined network[C]//Proceedings of the IEEE Conference on Computer Communications Workshops. Piscataway: IEEE, 2020: 1254-1259.
[66] BOUZIDI E H, OUTTAGARTS A, LANGAR R, et al. Dyn-amic clustering of software defined network switches and controller placement using deep reinforcement learning[J]. Computer Networks, 2022, 207: 108852.
[67] STAMPA G, ARIAS M, SANCHEZ-CHARLES D, et al. A deep-reinforcement learning approach for software-defined networking routing optimization[J]. arXiv:1709.07080, 2017.
[68] WANG Y, SHANG F J, LEI J J. Energy-efficient and delay-guaranteed routing algorithm for software-defined wireless sensor networks: a cooperative deep reinforcement learning approach[J]. Journal of Network and Computer Applications, 2023, 217: 103674.
[69] JOSé S, ALBERT M, JUNLIN Y, et al. Routing based on reinforcement learning in optical transport networks[C]//Proceedings of the Optical Fiber Communication Conference, 2019: 1-3.
[70] CHEN Y R, REZAPOUR A, TZENG W G, et al. RL-routing: an SDN routing algorithm based on deep reinforcement learning[J]. IEEE Transactions on Network Science and Engineering, 2020, 7(4): 3185-3199.
[71] LIN N, HUANG J J, HAWBANI A, et al. Joint routing and computation offloading based deep reinforcement learning for flying Ad hoc networks[J]. Computer Networks, 2024, 249: 110514.
[72] DAI B, CAO Y Y, WU Z L, et al. IQoR-LSE: an intelligent QoS on-demand routing algorithm with link state estimation[J]. IEEE Systems Journal, 2022, 16(4): 5821-5830.
[73] SUN P H, LAN J L, GUO Z H, et al. Improving the scalability of deep reinforcement learning-based routing with control on partial nodes[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 3557-3561.
[74] GUO S Y, QI Y Y, JIN Y, et al. Endogenous trusted DRL-based service function chain orchestration for IoT[J]. IEEE Transactions on Computers, 2022, 71(2): 397-406.
[75] GUO Y Y, MA Y L, LUO H, et al. Traffic engineering in a shared inter-DC WAN via deep reinforcement learning[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(4): 2870-2881.
[76] HUONG T T, DANG K, DUNG N X, et al. A global multipath load-balanced routing algorithm based on Reinforcement Learning in SDN[C]//Proceedings of the 2019 International Conference on Information and Communication Technology Convergence. Piscataway: IEEE, 2019: 1336-1341.
[77] 陈嘉伟. 基于强化学习的动态路由算法研究与仿真软件实现[D]. 北京: 北京邮电大学, 2024: 44-50.
CHEN J W. Research on dynamic routing algorithm based on reinforcement learning and implementation of simulation software[D]. Beijing: Beijing University of Posts and Telecommunications, 2024: 44-50.
[78] LIU W X. Intelligent routing based on deep reinforcement learning in software-defined data-center networks[C]//Proceedings of the 2019 IEEE Symposium on Computers and Communications. Piscataway: IEEE, 2020: 1-6.
[79] 闫超, 相晓嘉, 徐昕, 等. 多智能体深度强化学习及其可扩展性与可迁移性研究综述[J]. 控制与决策, 2022, 37(12): 3083-3102.
YAN C, XIANG X J, XU X, et al. A survey on scalability and transferability of multi-agent deep reinforcement lear-ning[J]. Control and Decision, 2022, 37(12): 3083-3102.
[80] 王思颖. 基于深度强化学习的多智能体协同算法关键技术研究[D]. 成都: 电子科技大学, 2023: 25-28.
WANG S Y. Research on the key technology of multi-agent collaborative algorithm based on deep reinforcement learning[D]. Chengdu: University of Electronic Science and Technology of China, 2023: 25-28.
[81] LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. arXiv: 1706.02275, 2017.
[82] GóMEZ-DELAHIZ J, GALáN-JIMéNEZ J. Improving the traffic engineering of SDN networks by using local multi-agent deep reinforcement learning[C]//Proceedings of the 2024 IEEE Network Operations and Management Symposium. Piscataway: IEEE, 2024: 1-5.
[83] LIU H S, LAI J Y, ZHU J H, et al. Enabling high-throughput routing for LEO satellite broadband networks: a flow-centric deep reinforcement learning approach[J]. IEEE Internet of Things Journal, 2024, 11(17): 28705-28720.
[84] OKINE A A, ADAM N, NAEEM F, et al. Multi-agent deep reinforcement learning for packet routing in tactical mobile sensor networks[J]. IEEE Transactions on Network and Service Management, 2024, 21(2): 2155-2169.
[85] GUO H Y, YANG D H, GAO H. Reinforcement learning-based adaptive stateless routing for ambient backscatter wireless sensor networks[J]. IEEE Transactions on Communications, 2024, 72(7): 4206-4225.
[86] MAO B M, ZHOU X M, LIU J J, et al. On a cooperative deep reinforcement learning-based multi-objective routing strategy for diversified 6G metaverse services[J]. IEEE Transactions on Vehicular Technology, 2024, 73(9): 14092-14096.
[87] GOUDARZI S, ANISI M H, AHMADI H, et al. Dynamic resource allocation model for distribution operations using SDN[J]. IEEE Internet of Things Journal, 2021, 8(2): 976-988.
[88] SUN P H, GUO Z H, WANG G, et al. MARVEL: enabling controller load balancing in software-defined networks with multi-agent reinforcement learning[J]. Computer Networks, 2020, 177: 107230.
[89] YE M, HUANG L Q, WANG X L, et al. A new intelligent cross-domain routing method in SDN based on a proposed multiagent reinforcement learning algorithm[J]. International Journal of Intelligent Computing and Cybernetics, 2024, 17(2): 330-362.
[90] BAI J, SUN J C, WANG Z G, et al. An adaptive intelligent routing algorithm based on deep reinforcement learning[J]. Computer Communications, 2024, 216: 195-208.
[91] LYU Y F, HU H, FAN R F, et al. Dynamic routing for integrated satellite-terrestrial networks: a constrained multi-agent reinforcement learning approach[J]. IEEE Journal on Selected Areas in Communications, 2024, 42(5): 1204-1218.
[92] 肖扬. 基于深度强化学习的自智网络关键技术研究与应用[D]. 北京: 北京邮电大学, 2024: 104-110.
XIAO Y. Research and application of key technologies for autonomous networks based on deep reinforcement learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2024: 104-110.
[93] LE V A, NGUYEN D L, NGUYEN P L, et al. Traffic engineering in large-scale networks via multi-agent deep reinforcement learning with joint-training[C]//Proceedings of the 2024 33rd International Conference on Computer Communications and Networks. Piscataway: IEEE, 2024: 1-9.
[94] LIU C Y, XU M W, YANG Y, et al. DRL-OR: deep reinforcement learning-based online routing for multi-type service requirements[C]//Proceedings of the IEEE Conference on Computer Communications. Piscataway: IEEE, 2021: 1-10.
[95] LIU C Y, WU P F, XU M W, et al. Scalable deep reinforcement learning-based online routing for multi-type service requirements[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34(8): 2337-2351.
[96] LUAN Z Y, LI Q, JIANG Y, et al. MATE: when multi-agent deep reinforcement learning meets traffic engineering in multi-domain networks[J]. Computer Networks, 2024, 247: 110399.
[97] YUAN T T, DA ROCHA NETO W, ROTHENBERG C E, et al. Dynamic controller assignment in software defined Internet of vehicles through multi-agent deep reinforcement learning[J]. IEEE Transactions on Network and Service Management, 2021, 18(1): 585-596.
[98] GUI F, WANG S T, LI D, et al. RedTE: mitigating subsecond traffic bursts with real-time and distributed traffic engineering[C]//Proceedings of the ACM SIGCOMM 2024 Conference. New York: ACM, 2024: 71-85.
[99] HE J M, LI K, ZANG Y F, et al. Not all tasks are equally difficult: multi-task deep reinforcement learning with dyn-amic depth routing[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 12376-12384.
[100] HUANG C, CHEN G J, XIAO P, et al. Joint offloading and resource allocation for hybrid cloud and edge computing in SAGINs: a decision assisted hybrid action space deep reinforcement learning approach[J]. IEEE Journal on Sel-ected Areas in Communications, 2024, 42(5): 1029-1043.
[101] TANG L, LI Z X, LI J Y, et al. DT-assisted VNF migration in SDN/NVF-enabled IoT networks via multiagent deep reinforcement learning[J]. IEEE Internet of Things Journal, 2024, 11(14): 25294-25315.
[102] YE M H, HU Y, ZHANG J J, et al. Mitigating routing update overhead for traffic engineering by combining destination-based routing with reinforcement learning[J]. IEEE Journal on Selected Areas in Communications, 2022, 40(9): 2662-2677.
[103] YE M H, ZHANG J J, GUO Z H, et al. FlexDATE: flexible and disturbance-aware traffic engineering with reinforcement learning in software-defined networks[J]. IEEE/ACM Transactions on Networking, 2023, 31(4): 1433-1448.
[104] REZAPOUR A, TZENG W G. RL-shield: mitigating target link-flooding attacks using SDN and deep reinforcement learning routing algorithm[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(6): 4052-4067.
[105] PEI X L, SUN P H, HU Y X, et al. Enabling efficient routing for traffic engineering in SDN with deep reinforcement learning[J]. Computer Networks, 2024, 241: 110220.
[106] 袁帅, 张慧, 蔡安亮, 等. 基于自注意力深度强化学习的特定流路由选择算法[J]. 光通信技术, 2024, 48(3): 7-12.
YUAN S, ZHANG H, CAI A L, et al. Specific flow routing selection algorithm based on self-attention deep reinforcement learning[J]. Optical Communication Technology, 2024, 48(3): 7-12.
[107] SUN P H, LI J F, LAN J L, et al. RNN deep reinforcement learning for routing optimization[C]//Proceedings of the 2018 IEEE 4th International Conference on Computer and Communications. Piscataway: IEEE, 2019: 285-289.
[108] YANG S J, ZHUANG L, ZHANG J H, et al. A multipolicy deep reinforcement learning approach for multiobjective joint routing and scheduling in deterministic networks[J]. IEEE Internet of Things Journal, 2024, 11(10): 17402-17418.
[109] BERNáRDEZ G, SUáREZ-VARELA J, LóPEZ A, et al. Is machine learning ready for traffic engineering optimization?[C]//Proceedings of the 2021 IEEE 29th International Conference on Network Protocols. Piscataway: IEEE, 2021: 1-11.
[110] DING M J, GUO Y Y, HUANG Z B, et al. GROM: a generalized routing optimization method with graph neural network and deep reinforcement learning[J]. Journal of Network and Computer Applications, 2024, 229: 103927.
[111] HE Q, WANG Y, WANG X W, et al. Routing optimization with deep reinforcement learning in knowledge defined networking[J]. IEEE Transactions on Mobile Computing, 2024, 23(2): 1444-1455.
[112] XU Z Y, YAN F Y, SINGH R, et al. Teal: learning-accelerated optimization of WAN traffic engineering[C]//Proceedings of the ACM SIGCOMM 2023 Conference. New York: ACM, 2023: 378-393.
[113] JAKOB N F, GREGORY F, TRIANTAFYLLOS A, et al. Counterfactual multi-agent policy gradients [J]. arXiv:1705. 08926, 2017.
[114] 孙鹏浩, 兰巨龙, 申涓, 等. 基于牵引控制的深度强化学习路由策略生成[J]. 计算机研究与发展, 2021, 58(7): 1563-1572.
SUN P H, LAN J L, SHEN J, et al. Pinning control-based routing policy generation using deep reinforcement learning[J]. Journal of Computer Research and Development, 2021, 58(7): 1563-1572.
[115] SUN P H, GUO Z H, LAN J L, et al. ScaleDRL: a scalable deep reinforcement learning approach for traffic engineering in SDN with pinning control[J]. Computer Networks, 2021, 190: 107891.
[116] SUN P H, LAN J L, LI J F, et al. A scalable deep reinforcement learning approach for traffic engineering based on link control[J]. IEEE Communications Letters, 2021, 25(1): 171-175.
[117] WU Y H, MANSIMOV E, LIAO S, et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5285-5294.
[118] TANG F X, WEN C, LUO L F, et al. Blockchain-based trusted traffic offloading in space-air-ground integrated networks (SAGIN): a federated reinforcement learning approach[J]. IEEE Journal on Selected Areas in Communications, 2022, 40(12): 3501-3516.
[119] CHEN X Y, HAN G J, BI Y G, et al. Traffic prediction-assisted federated deep reinforcement learning for service migration in digital twins-enabled MEC networks[J]. IEEE Journal on Selected Areas in Communications, 2023, 41(10): 3212-3229.
[120] 谢陶, 黄迎春. 基于深度强化学习的边缘计算资源分配方法[J]. 火力与指挥控制, 2024, 49(9): 185-190.
XIE T, HUANG Y C. Edge computing resource allocation method based on deep reinforcement learning[J]. Fire Control & Command Control, 2024, 49(9): 185-190.
[121] 徐思雅, 郭佳惠. 基于双层联邦学习的高动态车联网业务边缘协作计算机制[J]. 电子学报, 2024, 52(7): 2228-2241.
XU S Y, GUO J H. Dual-layer federated learning based edge collaborative computing mechanism for high dynamic Internet of vehicle businesses[J]. Acta Electronica Sinica, 2024, 52(7): 2228-2241.
[122] 林泽阳, 赖俊, 陈希亮. 基于课程学习的深度强化学习研究综述[J]. 计算机技术与发展, 2022, 32(11): 16-23.
LIN Z Y, LAI J, CHEN X L. An overview of deep reinforcement learning based on curriculum learning[J]. Computer Technology and Development, 2022, 32(11): 16-23.
[123] WEI A Q, YU H, LANG X P, et al. Dynamic controller placement for software-defined LEO network using deep reinforcement learning[C]//Proceedings of the 2021 7th International Conference on Computer and Communications. Piscataway: IEEE, 2022: 1314-1320.
[124] XIA Z X, ZHOU Y J, YAN F Y, et al. GeNet: automatic curriculum generation for learning adaptation in networking[C]//Proceedings of the ACM SIGCOMM 2022 Conference. New York: ACM, 2022: 397-413.
[125] 许文俊, 吴思雷, 王凤玉, 等. 基于多智能体强化学习的大规模灾后用户分布式覆盖优化[J]. 通信学报, 2022, 43(8): 1-16.
XU W J, WU S L, WANG F Y, et al. Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning[J]. Journal on Communications, 2022, 43(8): 1-16.
[126] YANG X, YAN J Q, WANG D S, et al. WOAD3QN-RP: an intelligent routing protocol in wireless sensor networks: a swarm intelligence and deep reinforcement learning based approach[J]. Expert Systems with Applications, 2024, 246: 123089.
[127] MOUSSA N, NURELLARI E, AZBEG K, et al. A reinforcement learning based routing protocol for software-defined networking enabled wireless sensor network forest fire detection[J]. Future Generation Computer Systems, 2023, 149: 478-493.
[128] WEI Z C, LIU F, ZHANG Y, et al. A Q-Learning algorithm for task scheduling based on improved SVM in wireless sensor networks[J]. Computer Networks, 2019, 161: 138-149.
[129] LIN B, GUO Y Y, LUO H, et al. TITE: a transformer-based deep reinforcement learning approach for traffic engineering in hybrid SDN with dynamic traffic[J]. Future Generation Computer Systems, 2024, 161: 95-105.
[130] 张梦钰, 豆亚杰, 陈子夷, 等. 深度强化学习及其在军事领域中的应用综述[J]. 系统工程与电子技术, 2024, 46(4): 1297-1308.
ZHANG M Y, DOU Y J, CHEN Z Y, et al. Review of deep reinforcement learning and its applications in military field[J]. Systems Engineering and Electronics, 2024, 46(4): 1297-1308.
[131] 李鑫尧, 李晶晶, 朱磊, 等. 资源受限的大模型高效迁移学习算法研究综述[J]. 计算机学报, 2024, 47(11): 2491-2521.
LI X Y, LI J J, ZHU L, et al. Efficient transfer learning of large models with limited resources: a survey[J]. Chinese Journal of Computers, 2024, 47(11): 2491-2521.
[132] 韦云凯, 王志宏, 冷甦鹏. 量子强化学习技术及研究进展[J]. 广州大学学报(自然科学版), 2021, 20(1): 56-68.
WEI Y K, WANG Z H, LENG S P. Review of quantum reinforcement learning[J]. Journal of Guangzhou University (Natural Science Edition), 2021, 20(1): 56-68.
[133] 陈奕宇, 霍静, 丁天雨, 等. 元强化学习研究综述[J]. 软件学报, 2024, 35(4): 1618-1650.
CHEN Y Y, HUO J, DING T Y, et al. Survey of meta-reinforcement learning research[J]. Journal of Software, 2024, 35(4): 1618-1650.