计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (4): 1-24.DOI: 10.3778/j.issn.1002-8331.2407-0034
刘延飞,李超,王忠,王杰铃
出版日期:
2025-02-15
发布日期:
2025-02-14
LIU Yanfei, LI Chao, WANG Zhong, WANG Jieling
Online:
2025-02-15
Published:
2025-02-14
摘要: 多智能体深度强化学习近年来在解决智能体协作、竞争和通信问题上展现出巨大潜力。然而伴随着其在更多领域的应用,可扩展性问题备受关注,是理论研究到大规模工程应用的重要问题。回顾了强化学习理论和深度强化学习的典型算法,介绍了多智能体深度强化学习三类学习范式及其代表算法,并简要整理出当前主流的开源实验平台。详细探讨了多智能体深度强化学习在数量和场景上的可扩展性研究进展,分析了各自面临的核心问题并给出了现有的解决思路。展望了多智能体深度强化学习的应用前景和发展趋势,为推动该领域的进一步研究提供参考和启示。
刘延飞, 李超, 王忠, 王杰铃. 多智能体深度强化学习及可扩展性研究进展[J]. 计算机工程与应用, 2025, 61(4): 1-24.
LIU Yanfei, LI Chao, WANG Zhong, WANG Jieling. Research Progress on Multi-Agent Deep Reinforcement Learning and Scalability[J]. Computer Engineering and Applications, 2025, 61(4): 1-24.
[1] CHEN M, LIU P, ZHAO H. LiDAR-camera fusion: dual transformer enhancement for 3D object detection[J]. Engineering Applications of Artificial Intelligence, 2023, 120: 105815. [2] NAGARAJU M, BABU B S, SOMAYAJULU S, et al. An accurate foreground moving object detection based on segmentation techniques and optimal classifier[J]. Concurrency and Computation: Practice and Experience, 2022, 34(5): e6689. [3] CHOUDHARY T, GOYAL V, BANSAL A. WTASR: wavelet transformer for automatic speech recognition of Indian languages[J]. Big Data Mining and Analytics, 2023, 6(1): 85-91. [4] STAFYLAKIS T, MO?NER L, KAKOUROS S, et al. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations[C]//Proceedings of the 2022 IEEE Spoken Language Technology Workshop. Piscataway: IEEE, 2023: 1136-1143. [5] SEBASTIAN M P, G S K. Malayalam natural language processing: challenges in building a phrase-based statistical machine translation system[J]. ACM Transactions on Asian and Low-Resource Language Information Processing, 2023, 22(4): 117. [6] LIU Q, YOGATAMA D, BLUNSOM P. Relational memory augmented language models[J]. arXiv:2201.09680, 2022. [7] QIU D J. Automatic logic reasoning in artificial intelligence[J]. Artificial Intelligence and Robotics Research, 2019, 8(1): 7-16. [8] 苏炯铭, 刘鸿福, 陈少飞, 等. 多智能体即时策略对抗方法与实践[M]. 北京: 科学出版社, 2019: 14-15. SU J M, LIU H F, CHEN S F, et al. Method and practice of multi-agent real-time strategy countermeasure[M]. Beijing: Science Press, 2019: 14-15. [9] MAO W J, LIU Z J, LIU H, et al. Research progress on synergistic technologies of agricultural multi-robots[J]. Applied Sciences, 2021, 11(4): 1448. [10] RAHMAN M S, MAHMUD M A, POTA H R, et al. Distributed multi-agent-based protection scheme for transient stability enhancement in power systems[J]. International Journal of Emerging Electric Power Systems, 2015, 16(2): 117-129. [11] WOOLDRIDGE M. An introduction to multiagent systems[M]. Hoboken: Wiley & Sons, 2009. [12] SCHWAB D, ZHU Y F, VELOSO M. Zero shot transfer learning for robot?soccer[C]//Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2018: 2070-2072. [13] NATAN O, MIURA J. End-to-end autonomous driving with semantic depth cloud mapping and multi-agent[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(1): 557-571. [14] KODAMA N, HARADA T, MIYAZAKI K. Traffic signal control system using deep reinforcement learning with emphasis on reinforcing successful experiences[J]. IEEE Access, 2022, 10: 128943-128950. [15] KIM W, SUNG Y, KIM W, et al. Parameter sharing with network pruning for scalable multi-agent deep reinforcement learning[C]//Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2023: 1942-1950. [16] YANG N, DING B, SHI P, et al. Improving scalability of multi-agent reinforcement learning with parameters sharing[C]//Proceedings of the 2022 IEEE International Conference on Joint Cloud Computing. Piscataway: IEEE, 2022: 37-42. [17] IQBAL S, SHA F. Actor-attention-critic for multi-agent reinforcement learning[C]//Proceedings of the 2019 International Conference on Machine Learning, 2019: 2961-2970. [18] FU Q, QIU T, PU Z, et al. A cooperation graph approach for multiagent sparse reward reinforcement learning[C]//Proceedings of the 2022 International Joint Conference on Neural Networks. Piscataway: IEEE, 2022: 1-8. [19] XU Y, YU J, BUEHRER R M. The application of deep reinforcement learning to distributed spectrum access in dynamic heterogeneous environments with partial observations[J]. IEEE Transactions on Wireless Communications, 2020, 19(7): 4494-4506. [20] ORTIZ A, WEBER T, KLEIN A. Multi-agent reinforcement learning for energy harvesting two-hop communications with a partially observable system state[J]. IEEE Transactions on Green Communications and Networking, 2021, 5(1): 442-456. [21] WANG T, WANG J, ZHENG C, et al. Learning nearly decomposable value functions via communication minimization[J]. arXiv:1910.05366, 2019. [22] SEO M, VECCHIETTI L F, LEE S, et al. Rewards prediction-based credit assignment for reinforcement learning with sparse binary rewards[J]. IEEE Access, 2019, 7: 118776-118791. [23] 杜威, 丁世飞. 多智能体强化学习综述[J]. 计算机科学, 2019, 46(8): 1689-1699. DU W, DING S F. Overview on multi-agent reinforcement learning[J]. Computer Science, 2019, 46(8): 1689-1699. [24] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7): 1301-1312. SUN C Y, MU C X. Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7): 1301-1312. [25] OROOJLOOY A, HAJINEZHAD D. A review of cooperative multi-agent deep reinforcement learning[J]. Applied Intelligence, 2023, 53(11): 13677-13722. [26] 王军, 曹雷, 陈希亮, 等. 多智能体博弈强化学习研究综述[J]. 计算机工程与应用, 2021, 57(21): 1-13. WANG J, CAO L, CHEN X L, et al. Overview on reinforcement learning of multi-agent game[J]. Computer Engineering and Applications, 2021, 57(21): 1-13. [27] 丁世飞, 杜威, 张健, 等. 多智能体深度强化学习研究进展[J/OL]. 计算机学报 [2024-06-30]. http://kns.cnki.net/kcms/dtail/11.1826.TP.20240419.1553.004.html. DING S F, DU W, ZHANG J, et al. Research progress of multi-agent deep reinforcement learning[J/OL]. Chinese Journal of Computers [2024-06-30]. http://kns.cnki.net/kcms/dtail/11.1826.TP.20240419.1553.004. html. [28] SUN C, HUANG S, POMPILI D. LLM-based multi-agent reinforcement learning: current and future directions[J]. arXiv:2405.11106, 2024. [29] 温广辉, 杨涛, 周佳玲, 等. 强化学习与自适应动态规划: 从基础理论到多智能体系统中的应用进展综述[J]. 控制与决策, 2023, 38(5): 1200-1230. WEN G H, YANG T, ZHOU J L, et al. Reinforcement learning and adaptive/approximate dynamic programming: a survey from theory to applications in multi-agent systems[J]. Control and Decision, 2023, 38(5): 1200-1230. [30] DINNEWETH J, BOUBEZOUL A, MANDIAU R, et al. Multi-agent reinforcement learning for autonomous vehicles: a survey[J]. Autonomous Intelligent Systems, 2022, 2(1): 27. [31] ORR J, DUTTA A. Multi-agent deep reinforcement learning for multi-robot applications: a survey[J]. Sensors, 2023, 23(7): 3625. [32] FRATTOLILLO F, BRUNORI D, IOCCHI L. Scalable and cooperative deep reinforcement learning approaches for multi-UAV systems: a systematic review[J]. Drones, 2023, 7(4): 236. [33] 李凡. 深度强化学习的关键技术研究[D]. 成都: 电子科技大学, 2023. LI F. Research on key technologies of deep reinforcement learning[D]. Chengdu: University of Electronic Science and Technology of China, 2023. [34] ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: a brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26-38. [35] LI Y. Deep reinforcement learning: an overview[J]. arXiv:1701.07274, 2017. [36] NGUYEN N D, NGUYEN T, NAHAVANDI S. System design perspective for human-level agents using deep reinforcement learning: a survey[J]. IEEE Access, 2017, 5: 27091-27102. [37] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[J]. IEEE Transactions on Neural Networks, 1998, 9(5): 1054. [38] MINSKY L M. Theory of neural-analog reinforcement systems and its application to the brain-model problem[D]. Princeton University, 1954. [39] BELLMAN R E. Dynamic programming[M]. Princeton: Princeton University Press, 1957. [40] BELLMAN R E. A Markov decision process[J]. Journal of Mathematics and Mechanics, 1957: 679-684. [41] RONALD H. Dynamic programming and Markov process[M]. Cambridge: MIT Press, 1960. [42] LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning[C]//Proceedings of the 2010 International Joint Conference on Neural Networks. Piscataway: IEEE, 2010: 1-8. [43] LANGE S, RIEDMILLER M, VOIGTL?NDER A. Autonomous reinforcement learning on raw visual input data in a real world application[C]//Proceedings of the 2012 International Joint Conference on Neural Networks. Piscataway: IEEE, 2012: 1-8. [44] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. arXiv:1312.5602, 2013. [45] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. [46] LIN L J. Reinforcement learning for robots using neural networks[D]. Pittsburgh: Carnegie-Mellon University. School of Computer Science, 1993. [47] HAUSKNECHT M, STONE P, RAMANI D. Deep recurrent Q-learning for partially observable MDPs[J]. arXiv:1507. 06527, 2015. [48] VAN HASSELT H, GUEZ A, SILVER D, et al. Deep reinforcement learning with double Q-learning[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. New York: ACM, 2016: 2094-2100. [49] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[J]. arXiv:1511.05952, 2016. [50] FORTUNATO M, AZAR M G, PIOT B, et al. Noisy networks for exploration[C]//Proceedings of the 6th International Conference on Learning Representations, 2018. [51] OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[C]//Advances in Neural Information Processing Systems 29, 2016: 4033-4041. [52] WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York: ACM, 2016: 1995-2003. [53] BELLEMARE M G, DABNEY W, MUNOS R. A distributional perspective on reinforcement learning[J]. arXiv:1707. 06887, 2017. [54] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 3215-3222. [55] LI T, ZHU K, LUONG N C, et al. Applications of multi-agent reinforcement learning in future Internet: a comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2022, 24(2): 1240-1279. [56] ZHANG K Q, YANG Z R, BA?AR T. Multi-agent reinforcement learning: a selective overview of theories and algorithms[J]. arXiv:1911.10635, 2019. [57] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International Conference on Machine Learning, 2015: 1889-1897. [58] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017. [59] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning, 2016: 1928-1937. [60] BEN B. OpenAI Baselines: ACKTR & A2C[EB/OL]. [2024-06-12]. https://openai.com/research/openai-baselines-acktr-a2c. [61] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//Proceedings of the 31st International Conference on Machine Learning, 2014: 387-395. [62] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. arXiv:1509. 02971, 2015. [63] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[J]. arXiv:1802.09477, 2018. [64] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 1861-1870. [65] 周翔, 王继业, 陈盛, 等. 基于深度强化学习的微网优化运行综述[J]. 全球能源互联网, 2023, 6(3): 240-257. ZHOU X, WANG J Y, CHEN S, et al. Review of microgrid optimization operation based on deep reinforcement learning[J]. Journal of Global Energy Interconnection, 2023, 6(3): 240-257. [66] TAN M. Multi-agent reinforcement learning: independent vs. cooperative agents[C]//Proceedings of the 10th International Conference on Machine Learning, 1993: 330-337. [67] ABDALLAH S, KAISERS M. Addressing the policy-bias of Q-learning by repeating updates[C]//Proceedings of the 12th International Conference on Autonomous Agents and Multi-agent Systems. New York: ACM, 2013: 1045-1052. [68] YU C, ZHANG M, REN F, et al. Emotional multiagent reinforcement learning in spatial social dilemmas[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(12): 3083-3096. [69] PALMER G, TUYLS K, BLOEMBERGEN D, et al. Lenient multi-agent deep reinforcement learning[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. New York: ACM, 2018: 443-451. [70] ZHENG Y, MENG Z, HAO J, et al. Weighted double deep multiagent reinforcement learning in stochastic cooperative environments[C]//Proceedings of the 15th Pacific Rim International Conference on Artificial Intelligence. Cham: Springer, 2018: 421-429. [71] FOERSTER J N, ASSAEL Y M, DE FREITAS N, et al. Learning to communicate to solve riddles with deep distributed recurrent Q-networks[J]. arXiv:1602.02672, 2016. [72] HONG Z W, SU S Y, SHANN T Y, et al. A deep policy inference Q-network for multi-agent systems[J]. arXiv:1712.07893, 2017. [73] GUPTA J K, EGOROV M, KOCHENDERFER M. Cooperative multi-agent control using deep reinforcement learning[C]//Autonomous Agents and Multiagent Systems: AAMAS 2017 Workshops. Cham: Springer, 2017: 66-83. [74] SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2018: 2085-2087. [75] RASHID T, SAMVELYAN M, DE WITT C S, et al. QMIX: monotonic value function factorization for deep multi-agent reinforcement learning[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 4292-4301. [76] SON K, KIM D, KANG W J, et al. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 5887-5896. [77] RASHID T, FARQUHAR G, PENG B, et al. Weighted QMIX: expanding monotonic value function factorisation for deep multi-agent reinforcement learning[J]. arXiv:2006.10800, 2020. [78] WANG J, REN Z, LIU T, et al. QPLEX: duplex dueling multi-agent Q-learning[J]. arXiv:2008.01062, 2020. [79] YANG Y, HAO J, LIAO B, et al. Qatten: a general framework for cooperative multiagent reinforcement learning[J]. arXiv:2002.03939, 2020. [80] LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6382-6393. [81] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI, 2018: 2974-2982. [82] ZHANG F, LI J, LI Z. A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment[J]. Neurocomputing, 2020, 411: 206-215. [83] LI H, HE H. Multiagent trust region policy optimization[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(9): 12873-12887. [84] YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games[J]. arXiv:2103.01955, 2021. [85] KUBA J G, CHEN R, WEN M, et al. Trust region policy optimisation in multi-agent reinforcement learning[J]. arXiv:2109. 11251, 2021. [86] MORDATCH I, ABBEEL P. Emergence of grounded compositional language in multi-agent populations[J]. arXiv:1703. 04908, 2017. [87] TERRY J K, BLACK B, GRAMMEL N, et al. PettingZoo: Gym for multi-agent reinforcement learning[J]. arXiv:2009. 14471, 2020. [88] SAMVELYAN M, RASHID T, DE WITT C S, et al. The StarCraft multi-agent challenge[J]. arXiv:1902.04043, 2019. [89] ZHENG L, YANG J, CAI H, et al. MAgent: a many-agent reinforcement learning platform for artificial collective intelligence[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 8222-8223. [90] PENG B, RASHID T, DE WITT C A S, et al. FACMAC: factored multi-agent centralised policy gradients[J]. arXiv:2003.06709, 2020. [91] 周文宏. 无人机集群多目标搜索与跟踪问题的深度强化学习方法研究[D]. 长沙: 国防科技大学, 2021. ZHOU W H. Research on deep reinforcement learning method for multi-target search and tracking of UAV cluster[D]. Changsha: National University of Defense Technology, 2021. [92] FAN T, LONG P, LIU W, et al. Fully distributed multi-robot collision avoidance via deep reinforcement learning for safe and efficient navigation in complex scenarios[J]. arXiv:1808. 03841, 2018. [93] LONG P X, FAN T X, LIAO X Y, et al. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2018: 6252-6259. [94] FOERSTER J N, ASSAEL Y M, DE FREITAS N, et al. Learning to communicate with deep multi-agent reinforcement learning[J]. arXiv:1605.06676, 2016. [95] SUKHBAATAR S, SZLAM A, FERGUS R. Learning multiagent communication with backpropagation[C]//Advances in Neural Information Processing Systems 29, 2016: 2137-2145. [96] ZHANG Z, YANG J C, ZHA H Y. Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization[J]. arXiv:1909.10651, 2019. [97] WANG W, WANG L, WU J, et al. Oracle-guided deep reinforcement learning for large-scale multi-UAVs flocking and navigation[J]. IEEE Transactions on Vehicular Technology, 2022, 71(10): 10280-10292. [98] NGUYEN T T, NGUYEN N D, NAHAVANDI S. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839. [99] 殷昌盛, 杨若鹏, 朱巍, 等. 多智能体分层强化学习综述[J]. 智能系统学报, 2020, 15(4): 646-655. YIN C S, YANG R P, ZHU W, et al. A survey on multi-agent hierarchical reinforcement learning[J]. CAAI Transactions on Intelligent Systems, 2020, 15(4): 646-655. [100] XU L, HUANG Y C, XUE Y, et al. Hierarchical reinforcement learning in multi-domain elastic optical networks to realize joint RMSA[J]. Journal of Lightwave Technology, 2023, 41(8): 2276-2288. [101] LI B, ZHU Z. GNN-based hierarchical deep reinforcement learning for NFV-oriented online resource orchestration in elastic optical DCIs[J]. Journal of Lightwave Technology, 2022, 40(4): 935-946. [102] LI B. Hierarchical architecture for multi-agent reinforcement learning in intelligent game[C]//Proceedings of the 2022 International Joint Conference on Neural Networks. Piscataway: IEEE, 2022: 1-8. [103] MUKHERJEE S, HUANG R, HUANG Q, et al. Scalable voltage control using structure-driven hierarchical deep reinforcement learning[J]. arXiv:2102.00077, 2021. [104] CHEN Y, ZHU J, LIU Y, et al. Distributed hierarchical deep reinforcement learning for large-scale grid emergency control[J]. IEEE Transactions on Power Systems, 2024, 39(2): 4446-4458. [105] REN T, NIU J, DAI B, et al. Enabling efficient scheduling in large-scale UAV-assisted mobile-edge computing via hierarchical reinforcement learning[J]. IEEE Internet of Things Journal, 2022, 9(10): 7095-7109. [106] LIU Z, YANG X, LIU Z, et al. Knowing what not to do: leverage language model insights for action space pruning in multi-agent reinforcement learning[J]. arXiv:2405.16854, 2024. [107] PADULLAPARTHI V R, NAGARATHINAM S, VASAN A, et al. FALCON-farm level control for wind turbines using multi-agent deep reinforcement learning[J]. Renewable Energy, 2022, 181: 445-456. [108] JIN Y, WEI S, YUAN J, et al. Information-bottleneck-based behavior representation learning for multi-agent reinforcement learning[C]//Proceedings of the 2021 IEEE International Conference on Autonomous Systems. Piscataway: IEEE, 2021: 1-5. [109] PENG P, WEN Y, YANG Y D, et al. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play StarCraft combat games[J]. arXiv:1703. 10069, 2017. [110] CHU X, YE H. Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning[J]. arXiv:1710.00336, 2017. [111] CHRISTIANOS F, PAPOUDAKIS G, RAHMAN A, et al. Scaling multi-agent reinforcement learning with selective parameter sharing[J]. arXiv:2102.07475, 2022. [112] LI D, LOU N, ZHANG B, et al. Adaptive parameter sharing for multi-agent reinforcement learning[J]. arXiv:2312.09009, 2023. [113] ZHANG K, ZHU D, XU Q, et al. PPS-QMIX: periodically parameter sharing for accelerating convergence of multi-agent reinforcement learning[J]. arXiv:2403.02635, 2024. [114] HU F, DENG Y, HAMID AGHVAMI A. Scalable multi-agent reinforcement learning for dynamic coordinated multipoint clustering[J]. IEEE Transactions on Communications, 2023, 71(1): 101-114. [115] JIAO Z, OH J. End-to-end reinforcement learning for multi-agent continuous control[C]//Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications. Piscataway: IEEE, 2019: 535-540. [116] KAUSHIK M, KAUSHIK M, SINGHANIA N, et al. Parameter sharing reinforcement learning architecture for multi-agent driving[C]//Proceedings of the 2019 4th International Conference on Advances in Robotics. New York: ACM, 2020: 1-7. [117] JADOON M A, PASTORE A, NAVARRO M, et al. Learning random access schemes for massive machine-type communication with MARL[J]. IEEE Transactions on Machine Learning in Communications and Networking, 2023, 2: 95-109. [118] WANG Y, QIU D, STRBAC G, et al. Coordinated electric vehicle active and reactive power control for active distribution networks[J]. IEEE Transactions on Industrial Informatics, 2023, 19(2): 1611-1622. [119] CAO Y, SUN Z, SARTORETTI G. DAN: decentralized attention-based neural network for the MinMax multiple traveling salesman problem[C]//Proceedings of the 16th International Symposium on Distributed Autonomous Robotic Systems. Cham: Springer, 2024: 202-215. [120] ZHANG Y, QUINONES-GRUEIRO M, BARBOUR W, et al. Cooperative multi-agent reinforcement learning for large scale variable speed limit control[C]//Proceedings of the 2023 IEEE International Conference on Smart Computing. Piscataway: IEEE, 2023: 149-156. [121] YAN C, WANG C, XIANG X J, et al. Deep reinforcement learning of collision-free flocking policies for multiple fixed-wing UAVs using local situation maps[J]. IEEE Transactions on Industrial Informatics, 2022, 18(2): 1260-1270. [122] SARTORETTI G, KERR J, SHI Y F, et al. PRIMAL: pathfinding via reinforcement and imitation multi-agent learning[J]. IEEE Robotics and Automation Letters, 2019, 4(3): 2378-2385. [123] DAMANI M, LUO Z Y, WENZEL E, et al. PRIMAL2: pathfinding via reinforcement and imitation multi-agent learning-lifelong[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 2666-2673. [124] YANG Y D, LUO R, LI M, et al. Mean field multi-agent reinforcement learning[J]. arXiv:1802.05438, 2018. [125] ZHANG Z, WANG X H, ZHANG Q R, et al. Multi-robot cooperative pursuit via potential field-enhanced reinforcement learning[C]//Proceedings of the 2022 International Conference on Robotics and Automation. Piscataway: IEEE, 2022: 8808-8814. [126] WANG L X, YANG Z Y, WANG, Z Y. Breaking the curse of many agents: provable mean embedding Q-iteration for mean-field reinforcement learning[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 10092-10103. [127] HüTTENRAUCH M, SOSIC A, NEUMANN G. Deep reinforcement learning for swarm systems[J]. Journal of Machine Learning Research, 2019, 20(54): 1-31. [128] 闫超, 相晓嘉, 徐昕, 等. 多智能体深度强化学习及其可扩展性与可迁移性研究综述[J]. 控制与决策, 2022, 37(12): 3083-3102. YAN C, XIANG X J, XU X, et al. A survey on scalability and transferability of multi-agent deep reinforcement learning[J]. Control and Decision, 2022, 37(12): 3083-3102. [129] LIU X Y, TAN Y. Attentive relational state representation in decentralized multiagent reinforcement learning[J]. IEEE Transactions on Cybernetics, 2022, 52(1): 252-264. [130] ZHOU T Z, ZHANG F B, SHAO K, et al. Cooperative multi-agent transfer learning with level-adaptive credit assignment[J]. arXiv:2106.00517, 2021. [131] IQBAL S, DE WITT C A S, PENG B, et al. Randomized entity-wise factorization for multi-agent reinforcement learning[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 4596-4606. [132] BATRA S, HUANG Z H, PETRENKO A, et al. Decentralized control of quadrotor swarms with end-to-end deep reinforcement learning[J]. arXiv:2109.07735, 2021. [133] WANG W, YANG T, LIU Y, et al. From few to more: large-scale dynamic multiagent curriculum learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7293-7300. [134] YE Z, WANG K, CHEN Y, et al. Multi-UAV navigation for partially observable communication coverage by graph reinforcement learning[J]. IEEE Transactions on Mobile Computing, 2023, 22(7): 4056-4069. [135] SUI Z Z, PU Z Q, YI J Q, et al. Formation control with collision avoidance through deep reinforcement learning[C]//Proceedings of the 2019 International Joint Conference on Neural Networks. Piscataway: IEEE, 2019: 1-8. [136] SUI Z Z, PU Z Q, YI J Q, et al. Formation control with collision avoidance through deep reinforcement learning using model-guided demonstration[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(6): 2358-2372. [137] SCHLICHTING M R, NOTTER S, FICHTER W. LSTM-based spatial encoding: explainable path planning for time-variant multi-agent systems[C]//Proceedings of the AIAA SciTech 2021 Forum. Reston: AIAA, 2021: 1860. [138] OMIDSHAFIEI S, KIM D K, LIU M, et al. Learning to teach in cooperative multiagent reinforcement learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 6128-6136. [139] KIM D K, LIU M, OMIDSHAFIEI S, et al. Learning hierarchical teaching policies for cooperative agents[C]//Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2020: 620-628. [140] YE D, ZHU T, ZHU C, et al. Model-based self-advising for multi-agent learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 7934-7945. [141] WADHWANIA S, KIM D K, OMIDSHAFIEI S, et al. Policy distillation and value matching in multiagent reinforcement learning[C]//Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2019: 8193-8200. [142] GAO Z, XU K, DING B, et al. KnowSR: knowledge sharing among homogeneous agents in multi-agent reinforcement learning[J]. arXiv:2105.11611, 2021. [143] TSENG W C, WANG T H, CHEN L Y, et al. Offline multi-agent reinforcement learning with knowledge distillation[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems, 2024: 226-237. [144] QIN R J, CHEN F, WANG T H, et al. Multi-agent policy transfer via task relationship modeling[J]. arXiv:2203.04482, 2022. [145] DU Y S. Improving deep reinforcement learning via transfer[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, 2019: 2405-2407. [146] LIU Y, HU Y, GAO Y, et al. Value function transfer for deep multi-agent reinforcement learning based on N-step returns[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019: 457-463. [147] NIU L, LIANG W, TAO J, et al. Multi-agent reinforcement learning policy transfer by buffer[C]//Proceedings of the 2021 7th International Conference on Big Data and Information Analytics. Piscataway: IEEE, 2021: 491-495. [148] SHI H, LI J, MAO J, et al. Lateral transfer learning for multiagent reinforcement learning[J]. IEEE Transactions on Cybernetics, 2023, 53(3): 1699-1711. [149] AMMAR H B, EATON E, RUVOLO P, et al. Unsupervised cross-domain transfer in policy gradient reinforcement learning via manifold alignment[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence, 2015: 2504-2510. [150] ZHANG Y, ZHANG X, SHEN T, et al. Feature-option-action: a domain adaption transfer reinforcement learning framework[C]//Proceedings of the 2021 IEEE 8th International Conference on Data Science and Advanced Analytics. Piscataway: IEEE, 2021: 1-12. [151] LI S, GU F, ZHU G, et al. Context-aware policy reuse[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, 2019: 989-997. [152] YANG T P, HAO J Y, MENG Z P, et al. Efficient deep reinforcement learning via adaptive policy transfer[J]. arXiv:2002.08037, 2020. [153] YOU H, YANG T, ZHENG Y, et al. Cross-domain adaptive transfer reinforcement learning based on state-action correspondence[C]//Proceedings of the 2022 Conference on Uncertainty in Artificial Intelligence, 2022: 2299-2309. [154] ZHANG G, FENG L, WANG Y, et al. Reinforcement learning with adaptive policy gradient transfer across heterogeneous problems[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8(3): 2213-2227. [155] QIAN T, CHEN Y, CONG G, et al. AdapTraj: a multi-source domain generalization framework for multi-agent trajectory prediction[C]//Proceedings of the 2024 IEEE 40th International Conference on Data Engineering. Piscataway: IEEE, 2024: 5048-5060. [156] 李盛祥. 基于强化学习的多智能体协同关键技术及应用研究[D]. 郑州: 战略支援部队信息工程大学, 2021. LI S X. Research on key technologies and applications of multi-agent collaboration based on reinforcement learning[D]. Zhengzhou: Information Engineering University, 2021. [157] SASSO R, SABATELLI M, WIERING M A. Multi-source transfer learning for deep model-based reinforcement learning[J]. arXiv:2205.14410, 2022. [158] DE OLIVEIRA I R L, DE CARVALHO K B, BRAND?O A S. Curriculum-based reinforcement learning for an effective multi-agent path planning algorithm in warehouse scenarios[C]//Proceedings of the 2023 15th IEEE International Conference on Industry Applications. Piscataway: IEEE, 2023: 471-477. [159] AGARWAL A, KUMAR S, SYCARA K. Learning transferable cooperative behavior in multi-agent teams[C]//Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020: 1741-1743. [160] PHAN T, DRISCOLL J, ROMBERG J, et al. Confidence-based curriculum learning for multi-agent path finding[C]//Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024: 1558-1566. [161] NIPU A S, LIU S, HARRIS A. Enabling multi-agent transfer reinforcement learning via scenario independent representation[C]//Proceedings of the 2023 IEEE Conference on Games. Piscataway: IEEE, 2023: 1-8. [162] WANG X M, SUN G X, XIN Y X, et al. Deep transfer reinforcement learning for beamforming and resource allocation in multi-cell MISO-OFDMA systems[J]. IEEE Transactions on Signal and Information Processing Over Networks, 2022, 8: 815-829. [163] GAO Z, XU K, DING B, et al. KnowRU: knowledge reuse via knowledge distillation in multi-agent reinforcement learning[J]. Entropy, 2021, 23(8): 1043. [164] BO C, LIU S Z, LIU Y Y, et al. Research on isomorphic task transfer algorithm based on knowledge distillation in multi-agent collaborative systems[J]. Sensors, 2024, 24(14): 4741. [165] CHEN S T, LIU G J, ZHOU Z Y, et al. Robust multi-agent reinforcement learning method based on adversarial domain randomization for real-world dual-UAV cooperation[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(1): 1615-1627. [166] CANDELA E, PARADA L, MARQUES L, et al. Transferring multi-agent reinforcement learning policies for autonomous driving using sim-to-real[C]//Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2022: 8814-8820. [167] PIMENTEL L, PALEJA R, WANG Z Y, et al. Scaling multi-agent reinforcement learning via state upsampling[C]//Proceedings of the 2nd Workshop on Scaling Robot Learning, 2022. [168] ZHANG X, WU L, LIU H, et al. High-speed ramp merging behavior decision for autonomous vehicles based on multiagent reinforcement learning[J]. IEEE Internet of Things Journal, 2023, 10(24): 22664-22672. [169] YU C, WANG X, XU X, et al. Distributed multiagent coordinated learning for autonomous driving in highways based on dynamic coordination graphs[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(2): 735-748. [170] BOKADE R, JIN X, AMATO C. Multi-agent reinforcement learning based on representational communication for large-scale traffic signal control[J]. IEEE Access, 2023, 11: 47646-47658. [171] CHU T, WANG J, CODECà L, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(3): 1086-1095. [172] TANG Y, XU Y. Multi-agent deep reinforcement learning for solving large-scale air traffic flow management problem: a time-step sequential decision approach[C]//Proceedings of the 2021 IEEE/AIAA 40th Digital Avionics Systems Conference. Piscataway: IEEE, 2021: 1-10. [173] APAZA R D, LI H, HAN R, et al. Multi-agent deep reinforcement learning for spectrum and air traffic management in UAM with resource constraints[C]//Proceedings of the 2023 IEEE/AIAA 42nd Digital Avionics Systems Conference. Piscataway: IEEE, 2023: 1-7. [174] ZHANG M, PAN C. Hierarchical optimization scheduling algorithm for logistics transport vehicles based on multi-agent reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(3): 3108-3117. [175] SHEN Y, MCCLOSKY B, DURHAM J W, et al. Multi-agent reinforcement learning for resource allocation in large-scale robotic warehouse sortation centers[C]//Proceedings of the 2023 62nd IEEE Conference on Decision and Control. Piscataway: IEEE, 2023: 7137-7143. [176] ZHANG J, LV Y, LI Y, et al. An improved QMIX-based AGV scheduling approach for material handling towards intelligent manufacturing[C]//Proceedings of the 2022 IEEE 20th International Conference on Embedded and Ubiquitous Computing. Piscataway: IEEE, 2022: 54-59. [177] GUO P, XIONG J, WANG Y, et al. Intelligent scheduling for group distributed manufacturing systems: harnessing deep reinforcement learning in cloud-edge cooperation[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8(2): 1687-1698. [178] LI Z, JIANG X, YAO S, et al. Research on collaborative control method of manufacturing process based on distributed multi-agent cooperation[C]//Proceedings of the 2018 11th International Symposium on Computational Intelligence and Design, 2018: 41-46. [179] YANG Z, NGUYEN L, ZHU J, et al. Coordinating disaster emergency response with heuristic reinforcement learning[C]//Proceedings of the 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Piscataway: IEEE, 2020: 565-572. [180] LEE H R, LEE T. Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response[J]. European Journal of Operational Research, 2021, 291(1): 296-308. [181] 许文俊, 吴思雷, 王凤玉, 等. 基于多智能体强化学习的大规模灾后用户分布式覆盖优化[J]. 通信学报, 2022, 43(8): 1-16. XU W J, WU S L, WANG F Y, et al. Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning[J]. Journal on Communications, 2022, 43(8): 1-16. [182] 陈人龙, 陈嘉礼, 李善琦, 等. 多智能体强化学习方法综述[J]. 信息对抗技术, 2024(1): 18-32. CHEN R L, CHEN J L, LI S Q, et al. A survey of multi-agent reinforcement learning methods[J]. Information Countermeasure Technology, 2024(1): 18-32. [183] HAN S Y, ZHOU S L, WANG J W, et al. A multi-agent reinforcement learning approach for safe and efficient behavior planning of connected autonomous vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(5): 3654-3670. [184] WANG D. Multi-agent reinforcement learning for safe driving in on-ramp merging of autonomous vehicles[C]//Proceedings of the 2024 14th International Conference on Cloud Computing, Data Science & Engineering. Piscataway: IEEE, 2024: 644-651. [185] CAI P, WANG H, SUN Y, et al. DQ-GAT: towards safe and efficient autonomous driving with deep Q-learning and graph attention networks[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(11): 21102-21112. [186] LIANG Y, WU H, WANG H, et al. ASM-PPO: asynchronous and scalable multi-agent PPO for cooperative charging[C]//Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2022: 798-806. [187] ZHANG H Y, LIU M Q, CHEN Y F. Backdoor attacks on multi-agent reinforcement learning-based spectrum management[C]//Proceedings of the 2023 IEEE Global Communications Conference. Piscataway: IEEE, 2023: 3361-3365. [188] WILSON A, MENZIES R, MORARJI N, et al. Multi-agent reinforcement learning for maritime operational technology cyber security[J]. arXiv:2401.10149, 2024. [189] DAI D, BOROOMAND S. A review of artificial intelligence to enhance the security of big data systems: state-of-art, methodologies, applications, and challenges[J]. Archives of Computational Methods in Engineering, 2022, 29(2): 1291-1309. [190] YAO Q, WANG Y J, XIONG X L, et al. Adversarial decision-making for moving target defense: a multi-agent Markov game and reinforcement learning approach[J]. Entropy, 2023, 25(4): 605. [191] 李凡长, 刘洋, 吴鹏翔, 等. 元学习研究综述[J]. 计算机学报, 2021, 44(2): 422-446. LI F C, LIU Y, WU P X, et al. A survey on recent advances in meta-learning[J]. Chinese Journal of Computers, 2021, 44(2): 422-446. [192] LI J, ZHOU T. Evolutionary multi-agent deep meta reinforcement learning method for swarm intelligence energy management of isolated multi-area microgrid with Internet of things[J]. IEEE Internet of Things Journal, 2023, 10(14): 12923-12937. [193] CHEN L, HU B, GUAN Z H, et al. Multiagent meta-reinforcement learning for adaptive multipath routing optimization[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(10): 5374-5386. [194] MUNIR M S, TRAN N H, SAAD W, et al. Multi-agent meta-reinforcement learning for self-powered and sustainable edge computing systems[J]. IEEE Transactions on Network and Service Management, 2021, 18(3): 3353-3374. [195] CHEN D, QI Q, FU Q, et al. Transformer-based reinforcement learning for scalable multi-UAV area coverage[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(8): 10062-10077. [196] TANG X, XU J, WANG S. RoMA: resilient multi-agent reinforcement learning with dynamic participating agents[C]//Proceedings of the 2023 IEEE 12th International Conference on Cloud Networking. Piscataway: IEEE, 2023: 247-255. [197] LIN W, ZHAO W, LIU H. Robust optimal formation control of heterogeneous multi-agent system via reinforcement learning[J]. IEEE Access, 2020, 8: 218424-218432. [198] KOUZEGHAR M, SONG Y, MEGHJANI M, et al. Multi-target pursuit by a decentralized heterogeneous UAV swarm using deep multi-agent reinforcement learning[C]//Proceedings of the 2023 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2023: 3289-3295. [199] PAN Z, QU Z, CHEN Y, et al. A distributed assignment method for dynamic traffic assignment using heterogeneous-adviser based multi-agent reinforcement learning[J]. IEEE Access, 2020, 8: 154237-154255. [200] ZHAO N, LIANG Y C, NIYATO D, et al. Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks[J]. IEEE Transactions on Wireless Communications, 2019, 18(11): 5141-5152. |
[1] | 张少军, 苏长利. 基于情绪词典和BERT-BiLSTM的股指预测研究[J]. 计算机工程与应用, 2025, 61(4): 358-367. |
[2] | 李彦, 万征. 深度强化学习在边缘视频传输优化中的应用综述[J]. 计算机工程与应用, 2025, 61(4): 43-58. |
[3] | 顾金浩, 况立群, 韩慧妍, 曹亚明, 焦世超. 动态环境下共融机器人深度强化学习导航算法[J]. 计算机工程与应用, 2025, 61(4): 90-98. |
[4] | 李晓益, 胡滨, 秦进, 彭安浪. 结合元学习和安全区域探索的进化强化学习方法[J]. 计算机工程与应用, 2025, 61(1): 361-367. |
[5] | 张泽崴, 张建勋, 邹航, 李林, 南海. 多智能体深度强化学习的图像特征分类方法[J]. 计算机工程与应用, 2024, 60(7): 222-228. |
[6] | 孙崇, 王海荣, 荆博祥, 马赫. 融合动作退出和软奖励的强化学习知识推理方法[J]. 计算机工程与应用, 2024, 60(24): 158-165. |
[7] | 李鑫, 沈捷, 曹恺, 李涛. 深度强化学习的机械臂密集场景多物体抓取方法[J]. 计算机工程与应用, 2024, 60(23): 325-332. |
[8] | 李春, 吴志周, 许宏鑫, 梁韵逸. 基于多智能体强化学习自动合流控制方法研究[J]. 计算机工程与应用, 2024, 60(23): 349-356. |
[9] | 于振华, 杨文建, 李西滕, 丛旭亚. 基于强化学习的无人机系统模糊测试方法研究[J]. 计算机工程与应用, 2024, 60(21): 89-98. |
[10] | 陈素霞, 徐清雯, 刘久富, 解晖, 刘向武. 基于强化学习的舰船目标跟踪有限理性博弈算法研究[J]. 计算机工程与应用, 2024, 60(20): 116-123. |
[11] | 许爱东, 徐培明, 尚进, 孙钦东. 基于强化学习多算法组合模型的智能化模糊测试技术[J]. 计算机工程与应用, 2024, 60(20): 284-292. |
[12] | 黄泽丰, 李涛. RISE-D3QN驱动的多无人机数据采集路径规划[J]. 计算机工程与应用, 2024, 60(20): 328-338. |
[13] | 王凤英, 陈莹, 袁帅, 杜利明. 自注意力机制结合DDPG的机器人路径规划研究[J]. 计算机工程与应用, 2024, 60(19): 158-166. |
[14] | 宋家欢, 王晓峰, 胡思敏, 贾璟伟, 颜冬. 图着色问题的算法研究综述[J]. 计算机工程与应用, 2024, 60(18): 66-77. |
[15] | 李兴洲, 李艳武, 谢辉. 基于CNN的深度强化学习算法求解柔性作业车间调度问题[J]. 计算机工程与应用, 2024, 60(17): 312-320. |
阅读次数 | ||||||||||||||||||||||||||||||||||||||||||||||
全文 267
|
|
|||||||||||||||||||||||||||||||||||||||||||||
摘要 221
|
|
|||||||||||||||||||||||||||||||||||||||||||||