
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (24): 1-28.DOI: 10.3778/j.issn.1002-8331.2412-0248
刘延飞1,王程锦1,2+,李超1
出版日期:2025-12-15
发布日期:2025-12-15
LIU Yanfei1, WANG Chengjin1,2+, LI Chao1
Online:2025-12-15
Published:2025-12-15
摘要: 软件定义网络(software-defined networking,SDN)凭借其全局化、集中式的管理架构,为复杂动态网络管理带来了革命性变化,也为实施网络流量工程创造了便利条件。与此同时,强化学习因其在决策优化方面具备显著优势而备受关注。将强化学习与SDN独特架构相结合,应用于流量工程具有重要的现实意义。从理论和应用两个层面,依据技术发展脉络,全面梳理了强化学习、深度强化学习、多智能体深度强化学习在SDN流量工程中的研究进展;从方法分类、网络场景、强化学习算法、流量工程目标等多个维度,对现有研究成果进行了归纳、整理与分析,为实施SDN流量工程方法策略提供了多维视角;进一步归纳整理了强化学习与其他技术结合的研究进展,显示出其在提升流量工程策略性能方面的潜力。在总结现有研究进展的基础上,剖析了当前面临的挑战,并提出了未来的研究方向,为促进该领域的深化探索提供一定参考。
刘延飞, 王程锦, 李超. 基于强化学习的软件定义网络流量工程研究综述[J]. 计算机工程与应用, 2025, 61(24): 1-28.
LIU Yanfei, WANG Chengjin, LI Chao. Survey on Traffic Engineering in Software-Defined Networking Based on Reinforcement Learning[J]. Computer Engineering and Applications, 2025, 61(24): 1-28.
| [1] 胡道允, 齐进, 陆钱春, 等. 基于深度学习的流量工程算法研究与应用[J]. 电信科学, 2021, 37(2): 107-114. HU D Y, QI J, LU Q C, et al. Research and application of traffic engineering algorithm based on deep learning[J]. Telecommunications Science, 2021, 37(2): 107-114. [2] GRAVEY A, HéBUTERNE G, MAZUMDAR R R, et al. Traffic engineering in ATM networks: current trends and future issues[J]. Sadhana, 1994, 19(6): 1005-1025. [3] 杨华卫, 王洪波, 程时端, 等. 最小化路径代价和流量均衡模型及算法[J]. 电子与信息学报, 2010, 32(10): 2415-2420. YANG H W, WANG H B, CHENG S D, et al. Minimizing sum of path-cost model and algorithm for traffic balancing[J]. Journal of Electronics & Information Technology, 2010, 32(10): 2415-2420. [4] 张艳, 郑纪蛟. 基于MPLS的流量工程[J]. 计算机应用研究, 2002, 19(2): 58-59. ZHANG Y, ZHENG J J. The traffic engineering based on MPLS[J]. Application Research of Computers, 2002, 19(2): 58-59. [5] MCKEOWN N, ANDERSON T, BALAKRISHNAN H, et al. OpenFlow: enabling innovation in campus networks[J]. ACM SIGCOMM Computer Communication Review, 2008, 38(2): 69-74. [6] 王素彬, 朱永庆. SDN与流量精细化运营[J]. 电信科学, 2014, 30(11): 145-153. WANG S B, ZHU Y Q. SDN and traffic fine operation[J]. Telecommunications Science, 2014, 30(11): 145-153. [7] 张奇. SDN在传送网中的关键技术及流量工程应用场景[J]. 电信科学, 2015, 31(S1): 158-162. ZHANG Q. Key technologies of SDN in transport network and application scenarios of traffic engineering[J]. Telecommunications Science, 2015, 31(S1): 158-162. [8] FORTZ B, REXFORD J, THORUP M. Traffic engineering with traditional IP routing protocols[J]. IEEE Communications Magazine, 2002, 40(10): 118-124. [9] XIAO X P, HANNAN A, BAILEY B, et al. Traffic engineering with MPLS in the Internet[J]. IEEE Network, 2000, 14(2): 28-33. [10] AKYILDIZ I F, LEE A, WANG P, et al. A roadmap for traffic engineering in SDN-OpenFlow networks[J]. Computer Networks, 2014, 71: 1-30. [11] GUO Y Y, WANG Z L, YIN X, et al. Incremental deployment for traffic engineering in hybrid SDN network[C]//Proceedings of the 2015 IEEE 34th International Performance Computing and Communications Conference. Piscataway: IEEE, 2016: 1-8. [12] 王坤, 吕光宏, 胥林, 等. 分布式软件定义网络中多域流量工程的路由优化方法[J]. 重庆大学学报, 2024, 47(7): 110-124. WANG K, LYU G H, XU L, et al. Routing optimization method for multi-domain traffic engineering in distributed software-defined networking[J]. Journal of Chongqing University, 2024, 47(7): 110-124. [13] 邰进, 刘辰屹, 杨芫, 等. 低成本大规模直播流量工程[J]. 清华大学学报(自然科学版), 2024, 64(3): 591-600. TAI J, LIU C Y, YANG Y, et al. Low-cost traffic engineering for large-scale live streaming[J]. Journal of Tsinghua University (Science and Technology), 2024, 64(3): 591-600. [14] XIE J F, YU F R, HUANG T, et al. A survey of machine learning techniques applied to software defined networking (SDN): research issues and challenges[J]. IEEE Communications Surveys & Tutorials, 2019, 21(1): 393-430. [15] WANG M W, CUI Y, WANG X, et al. Machine learning for networking: workflow, advances and opportunities[J]. IEEE Network, 2018, 32(2): 92-99. [16] TROIA S, SAPIENZA F, VARé L, et al. On deep reinforcement learning for traffic engineering in SD-WAN[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(7): 2198-2212. [17] CICIO?LU M, ?ALHAN A. A multiprotocol controller deployment in SDN-based IoMT architecture[J]. IEEE Internet of Things Journal, 2022, 9(21): 20833-20840. [18] EL-GAROUI L, PIERRE S, CHAMBERLAND S. A new SDN-based routing protocol for improving delay in smart city environments[J]. Smart Cities, 2020, 3(3): 1004-1021. [19] WANG S P, NIE L S, LI G J, et al. A multitask learning-based network traffic prediction approach for SDN-enabled industrial Internet of Things[J]. IEEE Transactions on Ind-ustrial Informatics, 2022, 18(11): 7475-7483. [20] ALQIAM A A, YAO Y J, WANG Z D, et al. Transferable neural WAN TE for changing topologies[C]//Proceedings of the ACM SIGCOMM 2024 Conference. New York: ACM, 2024: 86-102. [21] GALMéS M F, PAILLISSE J, SUáREZ-VARELA J, et al. RouteNet-Fermi: network modeling with graph neural networks[J]. IEEE/ACM Transactions on Networking, 2023, 31(6): 3080-3095. [22] YE M H, ZHANG J J, GUO Z H, et al. LARRI: learning-based adaptive range routing for highly dynamic traffic in WANs[C]//Proceedings of the IEEE INFOCOM 2023 - IEEE Conference on Computer Communications. Piscataway: IEEE, 2023: 1-10. [23] LIU X M, ZHAO S Z, CUI Y, et al. FIGRET: fine-grained robustness-enhanced traffic engineering[C]//Proceedings of the ACM SIGCOMM 2024 Conference. New York: ACM, 2024: 117-135. [24] QU J, MA X B, LI J F, et al. An input-agnostic hierarchical deep learning framework for traffic fingerprinting[C]//Proceedings of the 32nd USENIX Conference on Security Symposium. New York: ACM, 2023: 589-606. [25] 刘辰屹, 徐明伟, 耿男, 等. 基于机器学习的智能路由算法综述[J]. 计算机研究与发展, 2020, 57(4): 671-687. LIU C Y, XU M W, GENG N, et al. A survey on machine learning based routing algorithms[J]. Journal of Computer Research and Development, 2020, 57(4): 671-687. [26] 郝学余, 吕光宏. 基于机器学习的SDN流量工程研究综述[J]. 计算机应用研究, 2022, 39(4): 961-967. HAO X Y, LYU G H. Survey of SDN traffic engineering research based on machine learning[J]. Application Research of Computers, 2022, 39(4): 961-967. [27] 杨洋, 吕光宏, 赵会, 等. 深度学习在软件定义网络研究中的应用综述[J]. 软件学报, 2020, 31(7): 2184-2204. YANG Y, LYU G H, ZHAO H, et al. Survey on deep learning applicatons in software defined networking research[J]. Journal of Software, 2020, 31(7): 2184-2204. [28] KRESIMIR J. Reinforcement learning: an introduction[J]. SIAM Review, 2021, 63(2): 423-425. [29] 刘延飞, 李超, 王忠, 等. 多智能体深度强化学习及可扩展性研究进展[J]. 计算机工程与应用, 2025, 61(4): 1-24. LIU Y F, LI C, WANG Z, et al. Research progress on multi-agent deep reinforcement learning and scalability[J]. Computer Engineering and Applications, 2025, 61(4): 1-24. [30] FARAHNAKIAN F, EBRAHIMI M, DANESHTALAB M, et al. Q-Learning based congestion-aware routing algorithm for on-chip network[C]//Proceedings of the 2011 IEEE 2nd International Conference on Networked Embedded Systems for Enterprise Applications. Piscataway: IEEE, 2012: 1-7. [31] RUMMERY G A. On-line Q-Learning using connectionist systems[J]. CTIT Technical Reports Series, 1994(1): 1-20. [32] JIN Z J, ZANG W F, JIANG Y M, et al. A Q-Learning based business differentiating routing mechanism in SDN architecture[J]. Journal of Physics: Conference Series, 2019, 1168: 022025. [33] NIE L S, NING Z L, OBAIDAT M S, et al. A reinforcement learning-based network traffic prediction mechanism in intelligent Internet of Things[J]. IEEE Transactions on Ind-ustrial Informatics, 2021, 17(3): 2169-2180. [34] CASAS-VELASCO D M, RENDON O M C, DA FONSECA N L S. Intelligent routing based on reinforcement learning for software-defined networking[J]. IEEE Transactions on Network and Service Management, 2021, 18(1): 870-881. [35] ANDREOLETTI D, VELICHKOVA T, VERTICALE G, et al. A privacy-preserving reinforcement learning algorithm for multi-domain virtual network embedding[J]. IEEE Transactions on Network and Service Management, 2020, 17(4): 2291-2304. [36] YAJADDA S H, SAFAEI F. A novel reinforcement learning routing algorithm for congestion control in complex networks[J]. arXiv:2401.00297, 2024. [37] SHI Y L, YANG Q L, HUANG X W, et al. An SDN-enabled framework for a load-balanced and QoS-aware Internet of underwater things[J]. IEEE Internet of Things Journal, 2023, 10(9): 7824-7834. [38] MOREIRA C M, KADDOUM G. QL vs. SARSA: performance evaluation for intrusion prevention systems in software-defined IoT networks[C]//Proceedings of the 2023 International Wireless Communications and Mobile Computing. Piscataway: IEEE, 2023: 500-504. [39] LI K X, WANG X W, NI Q, et al. Entropy-based reinforcement learning for computation offloading service in software-defined multi-access edge computing[J]. Future Generation Computer Systems, 2022, 136: 241-251. [40] XIAO Y, LIU J, WU J W, et al. Leveraging deep reinforcement learning for traffic engineering: a survey[J]. IEEE Communications Surveys & Tutorials, 2021, 23(4): 2064-2097. [41] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518: 529-533. [42] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. New York: ACM, 2016: 2094-2100. [43] HAUSKNECHT M, STONE P. Deep recurrent Q-Learning for partially observable MDPs[C]//Proceedings of the AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents. Arlington: AAAI, 2015: 29-37. [44] WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York: ACM, 2016: 1995-2003. [45] TOM S, JOHN Q, IOANNIS A, et al. Prioritized experience replay[J]. arXiv:1511.05952, 2015. [46] KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[C]//Advances in Neural Information Processing Systems, 2000: 1008-1014. [47] TIMOTHY P L, JONATHAN J H, ALEXANDER P, et al. Continuous control with deep reinforcement learning[C]//Proceedings of the International Conference on Learning Representations. Washington DC: ICLR, 2016. [48] 胡子剑, 高晓光, 万开方, 等. 异策略深度强化学习中的经验回放研究综述[J]. 自动化学报, 2023, 49(11): 2237-2256. HU Z J, GAO X G, WAN K F, et al. Research on experience replay of off-policy deep reinforcement learning: a review[J]. Acta Automatica Sinica, 2023, 49(11): 2237-2256. [49] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. New York: ACM, 2015: 1889-1897. [50] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[J]. arXiv:1602. 01783, 2016. [51] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017. [52] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[J]. arXiv:1801.01290, 2018. [53] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing fun-ction approximation error in actor-critic methods[C]//Proceedings of the International Conference on Machine Lear-ning, 2018. [54] 丁瑞金, 高飞飞, 邢玲. 基于深度强化学习的物联网智能路由策略[J]. 物联网学报, 2019, 3(2): 56-63. DING R J, GAO F F, XING L. Intelligent routing strategy in the Internet of Things based on deep reinforcement lear-ning[J]. Chinese Journal on Internet of Things, 2019, 3(2): 56-63. [55] CONG P Z, ZHANG Y C, LIU Z L, et al. A deep reinforcement learning-based multi-optimality routing scheme for dynamic IoT networks[J]. Computer Networks, 2021, 192: 108057. [56] BOUZIDI E H, OUTTAGARTS A, LANGAR R, et al. Deep Q-network and traffic prediction based routing optimization in software defined networks[J]. Journal of Network and Computer Applications, 2021, 192: 103181. [57] AGUIRRE SANCHEZ L P, SHEN Y, GUO M Y. DQS: a QoS-driven routing optimization approach in SDN using deep reinforcement learning[J]. Journal of Parallel and Distributed Computing, 2024, 188: 104851. [58] CASAS-VELASCO D M, RENDON O M C, DA FONSECA N L S. DRSIR: a deep reinforcement learning approach for routing in software-defined networking[J]. IEEE Transactions on Network and Service Management, 2022, 19(4): 4807-4820. [59] ALI R E, ERMAN B, BA?TU? E, et al. Hierarchical deep double Q-routing[C]//Proceedings of the 2020 IEEE International Conference on Communications. Piscataway: IEEE, 2020: 1-7. [60] XIA D, WAN J F, XU P P, et al. Deep reinforcement learning-based QoS optimization for software-defined factory heterogeneous networks[J]. IEEE Transactions on Network and Service Management, 2022, 19(4): 4058-4068. [61] HUANG L Q, YE M, XUE X S, et al. Intelligent routing method based on Dueling DQN reinforcement learning and network traffic state prediction in SDN[J]. Wireless Networks, 2024, 30(5): 4507-4525. [62] YAO J M, YAN C G, WANG J L, et al. Stable QoE-aware multi-SFCs cooperative routing mechanism based on deep reinforcement learning[J]. IEEE Transactions on Network and Service Management, 2024, 21(1): 120-131. [63] TANG F X, HOFNER H, KATO N, et al. A deep reinforcement learning-based dynamic traffic offloading in space-air-ground integrated networks(SAGIN)[J]. IEEE Journal on Selected Areas in Communications, 2022, 40(1): 276-289. [64] DOKE A R, SANGEETA K. Deep reinforcement learning based load balancing policy for balancing network traffic in datacenter environment[C]//Proceedings of the 2018 Second International Conference on Green Computing and Internet of Things. Piscataway: IEEE, 2019: 1-5. [65] WU Y W, ZHOU S P, WEI Y K, et al. Deep reinforcement learning for controller placement in software defined network[C]//Proceedings of the IEEE Conference on Computer Communications Workshops. Piscataway: IEEE, 2020: 1254-1259. [66] BOUZIDI E H, OUTTAGARTS A, LANGAR R, et al. Dyn-amic clustering of software defined network switches and controller placement using deep reinforcement learning[J]. Computer Networks, 2022, 207: 108852. [67] STAMPA G, ARIAS M, SANCHEZ-CHARLES D, et al. A deep-reinforcement learning approach for software-defined networking routing optimization[J]. arXiv:1709.07080, 2017. [68] WANG Y, SHANG F J, LEI J J. Energy-efficient and delay-guaranteed routing algorithm for software-defined wireless sensor networks: a cooperative deep reinforcement learning approach[J]. Journal of Network and Computer Applications, 2023, 217: 103674. [69] JOSé S, ALBERT M, JUNLIN Y, et al. Routing based on reinforcement learning in optical transport networks[C]//Proceedings of the Optical Fiber Communication Conference, 2019: 1-3. [70] CHEN Y R, REZAPOUR A, TZENG W G, et al. RL-routing: an SDN routing algorithm based on deep reinforcement learning[J]. IEEE Transactions on Network Science and Engineering, 2020, 7(4): 3185-3199. [71] LIN N, HUANG J J, HAWBANI A, et al. Joint routing and computation offloading based deep reinforcement learning for flying Ad hoc networks[J]. Computer Networks, 2024, 249: 110514. [72] DAI B, CAO Y Y, WU Z L, et al. IQoR-LSE: an intelligent QoS on-demand routing algorithm with link state estimation[J]. IEEE Systems Journal, 2022, 16(4): 5821-5830. [73] SUN P H, LAN J L, GUO Z H, et al. Improving the scalability of deep reinforcement learning-based routing with control on partial nodes[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 3557-3561. [74] GUO S Y, QI Y Y, JIN Y, et al. Endogenous trusted DRL-based service function chain orchestration for IoT[J]. IEEE Transactions on Computers, 2022, 71(2): 397-406. [75] GUO Y Y, MA Y L, LUO H, et al. Traffic engineering in a shared inter-DC WAN via deep reinforcement learning[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(4): 2870-2881. [76] HUONG T T, DANG K, DUNG N X, et al. A global multipath load-balanced routing algorithm based on Reinforcement Learning in SDN[C]//Proceedings of the 2019 International Conference on Information and Communication Technology Convergence. Piscataway: IEEE, 2019: 1336-1341. [77] 陈嘉伟. 基于强化学习的动态路由算法研究与仿真软件实现[D]. 北京: 北京邮电大学, 2024: 44-50. CHEN J W. Research on dynamic routing algorithm based on reinforcement learning and implementation of simulation software[D]. Beijing: Beijing University of Posts and Telecommunications, 2024: 44-50. [78] LIU W X. Intelligent routing based on deep reinforcement learning in software-defined data-center networks[C]//Proceedings of the 2019 IEEE Symposium on Computers and Communications. Piscataway: IEEE, 2020: 1-6. [79] 闫超, 相晓嘉, 徐昕, 等. 多智能体深度强化学习及其可扩展性与可迁移性研究综述[J]. 控制与决策, 2022, 37(12): 3083-3102. YAN C, XIANG X J, XU X, et al. A survey on scalability and transferability of multi-agent deep reinforcement lear-ning[J]. Control and Decision, 2022, 37(12): 3083-3102. [80] 王思颖. 基于深度强化学习的多智能体协同算法关键技术研究[D]. 成都: 电子科技大学, 2023: 25-28. WANG S Y. Research on the key technology of multi-agent collaborative algorithm based on deep reinforcement learning[D]. Chengdu: University of Electronic Science and Technology of China, 2023: 25-28. [81] LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. arXiv: 1706.02275, 2017. [82] GóMEZ-DELAHIZ J, GALáN-JIMéNEZ J. Improving the traffic engineering of SDN networks by using local multi-agent deep reinforcement learning[C]//Proceedings of the 2024 IEEE Network Operations and Management Symposium. Piscataway: IEEE, 2024: 1-5. [83] LIU H S, LAI J Y, ZHU J H, et al. Enabling high-throughput routing for LEO satellite broadband networks: a flow-centric deep reinforcement learning approach[J]. IEEE Internet of Things Journal, 2024, 11(17): 28705-28720. [84] OKINE A A, ADAM N, NAEEM F, et al. Multi-agent deep reinforcement learning for packet routing in tactical mobile sensor networks[J]. IEEE Transactions on Network and Service Management, 2024, 21(2): 2155-2169. [85] GUO H Y, YANG D H, GAO H. Reinforcement learning-based adaptive stateless routing for ambient backscatter wireless sensor networks[J]. IEEE Transactions on Communications, 2024, 72(7): 4206-4225. [86] MAO B M, ZHOU X M, LIU J J, et al. On a cooperative deep reinforcement learning-based multi-objective routing strategy for diversified 6G metaverse services[J]. IEEE Transactions on Vehicular Technology, 2024, 73(9): 14092-14096. [87] GOUDARZI S, ANISI M H, AHMADI H, et al. Dynamic resource allocation model for distribution operations using SDN[J]. IEEE Internet of Things Journal, 2021, 8(2): 976-988. [88] SUN P H, GUO Z H, WANG G, et al. MARVEL: enabling controller load balancing in software-defined networks with multi-agent reinforcement learning[J]. Computer Networks, 2020, 177: 107230. [89] YE M, HUANG L Q, WANG X L, et al. A new intelligent cross-domain routing method in SDN based on a proposed multiagent reinforcement learning algorithm[J]. International Journal of Intelligent Computing and Cybernetics, 2024, 17(2): 330-362. [90] BAI J, SUN J C, WANG Z G, et al. An adaptive intelligent routing algorithm based on deep reinforcement learning[J]. Computer Communications, 2024, 216: 195-208. [91] LYU Y F, HU H, FAN R F, et al. Dynamic routing for integrated satellite-terrestrial networks: a constrained multi-agent reinforcement learning approach[J]. IEEE Journal on Selected Areas in Communications, 2024, 42(5): 1204-1218. [92] 肖扬. 基于深度强化学习的自智网络关键技术研究与应用[D]. 北京: 北京邮电大学, 2024: 104-110. XIAO Y. Research and application of key technologies for autonomous networks based on deep reinforcement learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2024: 104-110. [93] LE V A, NGUYEN D L, NGUYEN P L, et al. Traffic engineering in large-scale networks via multi-agent deep reinforcement learning with joint-training[C]//Proceedings of the 2024 33rd International Conference on Computer Communications and Networks. Piscataway: IEEE, 2024: 1-9. [94] LIU C Y, XU M W, YANG Y, et al. DRL-OR: deep reinforcement learning-based online routing for multi-type service requirements[C]//Proceedings of the IEEE Conference on Computer Communications. Piscataway: IEEE, 2021: 1-10. [95] LIU C Y, WU P F, XU M W, et al. Scalable deep reinforcement learning-based online routing for multi-type service requirements[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34(8): 2337-2351. [96] LUAN Z Y, LI Q, JIANG Y, et al. MATE: when multi-agent deep reinforcement learning meets traffic engineering in multi-domain networks[J]. Computer Networks, 2024, 247: 110399. [97] YUAN T T, DA ROCHA NETO W, ROTHENBERG C E, et al. Dynamic controller assignment in software defined Internet of vehicles through multi-agent deep reinforcement learning[J]. IEEE Transactions on Network and Service Management, 2021, 18(1): 585-596. [98] GUI F, WANG S T, LI D, et al. RedTE: mitigating subsecond traffic bursts with real-time and distributed traffic engineering[C]//Proceedings of the ACM SIGCOMM 2024 Conference. New York: ACM, 2024: 71-85. [99] HE J M, LI K, ZANG Y F, et al. Not all tasks are equally difficult: multi-task deep reinforcement learning with dyn-amic depth routing[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 12376-12384. [100] HUANG C, CHEN G J, XIAO P, et al. Joint offloading and resource allocation for hybrid cloud and edge computing in SAGINs: a decision assisted hybrid action space deep reinforcement learning approach[J]. IEEE Journal on Sel-ected Areas in Communications, 2024, 42(5): 1029-1043. [101] TANG L, LI Z X, LI J Y, et al. DT-assisted VNF migration in SDN/NVF-enabled IoT networks via multiagent deep reinforcement learning[J]. IEEE Internet of Things Journal, 2024, 11(14): 25294-25315. [102] YE M H, HU Y, ZHANG J J, et al. Mitigating routing update overhead for traffic engineering by combining destination-based routing with reinforcement learning[J]. IEEE Journal on Selected Areas in Communications, 2022, 40(9): 2662-2677. [103] YE M H, ZHANG J J, GUO Z H, et al. FlexDATE: flexible and disturbance-aware traffic engineering with reinforcement learning in software-defined networks[J]. IEEE/ACM Transactions on Networking, 2023, 31(4): 1433-1448. [104] REZAPOUR A, TZENG W G. RL-shield: mitigating target link-flooding attacks using SDN and deep reinforcement learning routing algorithm[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(6): 4052-4067. [105] PEI X L, SUN P H, HU Y X, et al. Enabling efficient routing for traffic engineering in SDN with deep reinforcement learning[J]. Computer Networks, 2024, 241: 110220. [106] 袁帅, 张慧, 蔡安亮, 等. 基于自注意力深度强化学习的特定流路由选择算法[J]. 光通信技术, 2024, 48(3): 7-12. YUAN S, ZHANG H, CAI A L, et al. Specific flow routing selection algorithm based on self-attention deep reinforcement learning[J]. Optical Communication Technology, 2024, 48(3): 7-12. [107] SUN P H, LI J F, LAN J L, et al. RNN deep reinforcement learning for routing optimization[C]//Proceedings of the 2018 IEEE 4th International Conference on Computer and Communications. Piscataway: IEEE, 2019: 285-289. [108] YANG S J, ZHUANG L, ZHANG J H, et al. A multipolicy deep reinforcement learning approach for multiobjective joint routing and scheduling in deterministic networks[J]. IEEE Internet of Things Journal, 2024, 11(10): 17402-17418. [109] BERNáRDEZ G, SUáREZ-VARELA J, LóPEZ A, et al. Is machine learning ready for traffic engineering optimization?[C]//Proceedings of the 2021 IEEE 29th International Conference on Network Protocols. Piscataway: IEEE, 2021: 1-11. [110] DING M J, GUO Y Y, HUANG Z B, et al. GROM: a generalized routing optimization method with graph neural network and deep reinforcement learning[J]. Journal of Network and Computer Applications, 2024, 229: 103927. [111] HE Q, WANG Y, WANG X W, et al. Routing optimization with deep reinforcement learning in knowledge defined networking[J]. IEEE Transactions on Mobile Computing, 2024, 23(2): 1444-1455. [112] XU Z Y, YAN F Y, SINGH R, et al. Teal: learning-accelerated optimization of WAN traffic engineering[C]//Proceedings of the ACM SIGCOMM 2023 Conference. New York: ACM, 2023: 378-393. [113] JAKOB N F, GREGORY F, TRIANTAFYLLOS A, et al. Counterfactual multi-agent policy gradients [J]. arXiv:1705. 08926, 2017. [114] 孙鹏浩, 兰巨龙, 申涓, 等. 基于牵引控制的深度强化学习路由策略生成[J]. 计算机研究与发展, 2021, 58(7): 1563-1572. SUN P H, LAN J L, SHEN J, et al. Pinning control-based routing policy generation using deep reinforcement learning[J]. Journal of Computer Research and Development, 2021, 58(7): 1563-1572. [115] SUN P H, GUO Z H, LAN J L, et al. ScaleDRL: a scalable deep reinforcement learning approach for traffic engineering in SDN with pinning control[J]. Computer Networks, 2021, 190: 107891. [116] SUN P H, LAN J L, LI J F, et al. A scalable deep reinforcement learning approach for traffic engineering based on link control[J]. IEEE Communications Letters, 2021, 25(1): 171-175. [117] WU Y H, MANSIMOV E, LIAO S, et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5285-5294. [118] TANG F X, WEN C, LUO L F, et al. Blockchain-based trusted traffic offloading in space-air-ground integrated networks (SAGIN): a federated reinforcement learning approach[J]. IEEE Journal on Selected Areas in Communications, 2022, 40(12): 3501-3516. [119] CHEN X Y, HAN G J, BI Y G, et al. Traffic prediction-assisted federated deep reinforcement learning for service migration in digital twins-enabled MEC networks[J]. IEEE Journal on Selected Areas in Communications, 2023, 41(10): 3212-3229. [120] 谢陶, 黄迎春. 基于深度强化学习的边缘计算资源分配方法[J]. 火力与指挥控制, 2024, 49(9): 185-190. XIE T, HUANG Y C. Edge computing resource allocation method based on deep reinforcement learning[J]. Fire Control & Command Control, 2024, 49(9): 185-190. [121] 徐思雅, 郭佳惠. 基于双层联邦学习的高动态车联网业务边缘协作计算机制[J]. 电子学报, 2024, 52(7): 2228-2241. XU S Y, GUO J H. Dual-layer federated learning based edge collaborative computing mechanism for high dynamic Internet of vehicle businesses[J]. Acta Electronica Sinica, 2024, 52(7): 2228-2241. [122] 林泽阳, 赖俊, 陈希亮. 基于课程学习的深度强化学习研究综述[J]. 计算机技术与发展, 2022, 32(11): 16-23. LIN Z Y, LAI J, CHEN X L. An overview of deep reinforcement learning based on curriculum learning[J]. Computer Technology and Development, 2022, 32(11): 16-23. [123] WEI A Q, YU H, LANG X P, et al. Dynamic controller placement for software-defined LEO network using deep reinforcement learning[C]//Proceedings of the 2021 7th International Conference on Computer and Communications. Piscataway: IEEE, 2022: 1314-1320. [124] XIA Z X, ZHOU Y J, YAN F Y, et al. GeNet: automatic curriculum generation for learning adaptation in networking[C]//Proceedings of the ACM SIGCOMM 2022 Conference. New York: ACM, 2022: 397-413. [125] 许文俊, 吴思雷, 王凤玉, 等. 基于多智能体强化学习的大规模灾后用户分布式覆盖优化[J]. 通信学报, 2022, 43(8): 1-16. XU W J, WU S L, WANG F Y, et al. Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning[J]. Journal on Communications, 2022, 43(8): 1-16. [126] YANG X, YAN J Q, WANG D S, et al. WOAD3QN-RP: an intelligent routing protocol in wireless sensor networks: a swarm intelligence and deep reinforcement learning based approach[J]. Expert Systems with Applications, 2024, 246: 123089. [127] MOUSSA N, NURELLARI E, AZBEG K, et al. A reinforcement learning based routing protocol for software-defined networking enabled wireless sensor network forest fire detection[J]. Future Generation Computer Systems, 2023, 149: 478-493. [128] WEI Z C, LIU F, ZHANG Y, et al. A Q-Learning algorithm for task scheduling based on improved SVM in wireless sensor networks[J]. Computer Networks, 2019, 161: 138-149. [129] LIN B, GUO Y Y, LUO H, et al. TITE: a transformer-based deep reinforcement learning approach for traffic engineering in hybrid SDN with dynamic traffic[J]. Future Generation Computer Systems, 2024, 161: 95-105. [130] 张梦钰, 豆亚杰, 陈子夷, 等. 深度强化学习及其在军事领域中的应用综述[J]. 系统工程与电子技术, 2024, 46(4): 1297-1308. ZHANG M Y, DOU Y J, CHEN Z Y, et al. Review of deep reinforcement learning and its applications in military field[J]. Systems Engineering and Electronics, 2024, 46(4): 1297-1308. [131] 李鑫尧, 李晶晶, 朱磊, 等. 资源受限的大模型高效迁移学习算法研究综述[J]. 计算机学报, 2024, 47(11): 2491-2521. LI X Y, LI J J, ZHU L, et al. Efficient transfer learning of large models with limited resources: a survey[J]. Chinese Journal of Computers, 2024, 47(11): 2491-2521. [132] 韦云凯, 王志宏, 冷甦鹏. 量子强化学习技术及研究进展[J]. 广州大学学报(自然科学版), 2021, 20(1): 56-68. WEI Y K, WANG Z H, LENG S P. Review of quantum reinforcement learning[J]. Journal of Guangzhou University (Natural Science Edition), 2021, 20(1): 56-68. [133] 陈奕宇, 霍静, 丁天雨, 等. 元强化学习研究综述[J]. 软件学报, 2024, 35(4): 1618-1650. CHEN Y Y, HUO J, DING T Y, et al. Survey of meta-reinforcement learning research[J]. Journal of Software, 2024, 35(4): 1618-1650. |
| [1] | 吕光宏, 王坤. 时空图注意力机制下的SDN网络动态流量预测[J]. 计算机工程与应用, 2025, 61(8): 267-273. |
| [2] | 庾源清, 马为之, 张敏. 基于内在奖励的强化学习推荐探索策略[J]. 计算机工程与应用, 2025, 61(7): 188-195. |
| [3] | 马祖鑫, 崔允贺, 秦永彬, 申国伟, 郭春, 陈意, 钱清. 融合深度强化学习的卷积神经网络联合压缩方法[J]. 计算机工程与应用, 2025, 61(6): 210-219. |
| [4] | 张少军, 苏长利. 基于情绪词典和BERT-BiLSTM的股指预测研究[J]. 计算机工程与应用, 2025, 61(4): 358-367. |
| [5] | 刘延飞, 李超, 王忠, 王杰铃. 多智能体深度强化学习及可扩展性研究进展[J]. 计算机工程与应用, 2025, 61(4): 1-24. |
| [6] | 李彦, 万征. 深度强化学习在边缘视频传输优化中的应用综述[J]. 计算机工程与应用, 2025, 61(4): 43-58. |
| [7] | 顾金浩, 况立群, 韩慧妍, 曹亚明, 焦世超. 动态环境下共融机器人深度强化学习导航算法[J]. 计算机工程与应用, 2025, 61(4): 90-98. |
| [8] | 李斌, 潘智成. 基于帝国竞争演化与深度强化学习的背包问题优化算法[J]. 计算机工程与应用, 2025, 61(22): 92-113. |
| [9] | 韩慧妍, 石树熙, 况立群, 韩燮, 熊风光. 改进MADDPG算法的未知环境下多智能体单目标协同探索[J]. 计算机工程与应用, 2025, 61(22): 320-328. |
| [10] | 陈明童, 张健欣, 侯淑君. 柔性作业车间中图嵌入的深度强化调度策略研究[J]. 计算机工程与应用, 2025, 61(21): 342-350. |
| [11] | 施伟, 黄红蓝, 梁星星, 程光权, 郑臻哲. 端云协同离在线强化学习方法及其在兵棋上的应用[J]. 计算机工程与应用, 2025, 61(21): 144-156. |
| [12] | 顾同成, 徐东伟, 孙成巨. 无人驾驶深度强化学习决策模型性能评测方法综述[J]. 计算机工程与应用, 2025, 61(19): 12-42. |
| [13] | 杨伟达, 吴志周, 梁韵逸. 基于循环图注意力强化学习的交叉口多车协同控制方法[J]. 计算机工程与应用, 2025, 61(19): 282-291. |
| [14] | 熊丽琴, 陈希亮, 赖俊, 骆西建, 曹雷. 面向关系建模的合作多智能体深度强化学习综述[J]. 计算机工程与应用, 2025, 61(18): 41-60. |
| [15] | 张盛, 沈捷, 曹恺, 戴辉帅, 李涛. 基于改进DDPG的机械臂6D抓取方法研究[J]. 计算机工程与应用, 2025, 61(18): 317-325. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||