
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (18): 41-60.DOI: 10.3778/j.issn.1002-8331.2502-0087
熊丽琴,陈希亮,赖俊,骆西建,曹雷
出版日期:2025-09-15
发布日期:2025-09-15
XIONG Liqin, CHEN Xiliang, LAI Jun, LUO Xijian, CAO Lei
Online:2025-09-15
Published:2025-09-15
摘要: 近年来,多智能体深度强化学习发展迅速并被广泛用于各种多智能体协同任务,已经成为人工智能领域的一个研究热点,但如何实现多智能体高效协同仍是其当前面临的重大挑战之一。作为一种流行的解决方案,面向关系建模的合作多智能体深度强化学习方法通过刻画智能体与智能体、智能体与系统整体的关系来准确捕获并利用智能体的个体贡献和智能体间相互作用以有效促进多智能体协同,具有重要研究意义和应用价值。简要介绍多智能体系统中存在的关系和多智能体深度强化学习的基础知识;从关系建模层次的角度出发将面向关系建模的合作多智能体深度强化学习算法分为基于个体间关系建模、基于个体与全局间关系建模以及基于多尺度关系建模这三类,并对其基本原理及优缺点进行全面梳理;着重介绍了其在无人集群控制、任务与资源分配、智能交通运输等领域中的应用情况。最后,总结当前面临的主要挑战并对未来研究方向进行展望。
熊丽琴, 陈希亮, 赖俊, 骆西建, 曹雷. 面向关系建模的合作多智能体深度强化学习综述[J]. 计算机工程与应用, 2025, 61(18): 41-60.
XIONG Liqin, CHEN Xiliang, LAI Jun, LUO Xijian, CAO Lei. Survey of Cooperative Multi-Agent Deep Reinforcement Learning Based on Relational Modeling[J]. Computer Engineering and Applications, 2025, 61(18): 41-60.
| [1] 刘全, 翟建伟, 章宗长, 等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1): 1-27. LIU Q, ZHAI J W, ZHANG Z C, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1): 1-27. [2] 孙彧, 曹雷, 陈希亮, 等. 多智能体深度强化学习研究综述[J]. 计算机工程与应用, 2020, 56(5): 13-24. SUN Y, CAO L, CHEN X L, et al. Overview of multi-agent deep reinforcement learning[J]. Computer Engineering and Applications, 2020, 56(5): 13-24. [3] ZHAO Y L, YANG Z R, WANG Z R, et al. Local optimization achieves global optimality in multi-agent reinforcement learning[C]//Proceedings of the International Conference on Machine Learning. New York: ACM, 2023: 42200-42226. [4] WANG J X, YE D H, LU Z Q. More centralized training, still decentralized execution: multi-agent conditional policy factorization[C]//Proceedings of the 11th International Conference on Learning Representations. Rwanda: OpenReview.net, 2023: 1-18. [5] DIDDIGI R B, KAMANCHI C, BHATNAGAR S. A generalized minimax Q-learning algorithm for two-player zero-sum stochastic games[J]. IEEE Transactions on Automatic Control, 2022, 67(9): 4816-4823. [6] 吴哲, 李凯, 徐航, 等. 一种用于两人零和博弈对手适应的元策略演化学习算法[J]. 自动化学报, 2022, 48(10): 2462-2473. WU Z, LI K, XU H, et al. A meta-evolutionary learning algorithm for opponent adaptation in two-player zero-sum games[J]. Acta Automatica Sinica, 2022, 48(10): 2462-2473. [7] ZHAO Z R, CAO L, CHEN X L, et al. Improvement of MADRL equilibrium based on Pareto optimization[J]. The Computer Journal, 2023, 66(7): 1573-1585. [8] 董绍康, 李超, 杨光, 等. 混合博弈问题的求解与应用综述[J]. 软件学报, 2025, 36(1): 107-151. DONG S K, LI C, YANG G, et al. Survey on solutions and applications for mixed-motive games[J]. Journal of Software, 2025, 36(1): 107-151. [9] HOU Q C, YANG J W, SU Y Q, et al. Generalize learned heuristics to solve large-scale vehicle routing problems in real-time[C]//Proceedings of the 11th International Conference on Learning Representations. Rwanda: OpenReview.net, 2023: 1-37. [10] NAYAK S, CHOI K, DING W, et al. Scalable multi-agent reinforcement learning through intelligent information aggregation[C]//Proceedings of the 40th International Conference on Machine Learning. New York: ACM, 2023: 25817-25833. [11] LIU Y L, LUO G Y, YUAN Q, et al. GPLight: grouped multi-agent reinforcement learning for large-scale traffic signal control[C]//Proceedings of the 32nd International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2023: 199-207. [12] ZONG Z F, ZHENG M, LI Y, et al. MAPDP: cooperative multi-agent reinforcement learning to solve pickup and delivery problems[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 9980-9988. [13] WU L, GUO B, ZHANG Q Y, et al. Learning to self-reconfigure for freeform modular robots via altruism multi-agent reinforcement learning[C]//Proceedings of the 32nd International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2023: 5494-5502. [14] JOE W, LAU H C. Learning to send reinforcements: coordinating multi-agent dynamic police patrol dispatching and rescheduling via reinforcement learning[C]//Proceedings of the 32nd International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2023: 153-161. [15] ZHANG K, YANG Y P, XU C T, et al. Learning-to-dispatch: reinforcement learning based flight planning under emergency[C]//Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference. Piscataway: IEEE, 2021: 1821-1826. [16] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3): 279-292. [17] MATIGNON L, LAURENT G J, LE FORT-PIAT N. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems[J]. The Knowledge Engineering Review, 2012, 27(1): 1-31. [18] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 2974-2982. [19] SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. New York: ACM, 2018: 2085-2087. [20] OLIEHOEK F A, SPAAN M T J, VLASSIS N. Optimal and approximate Q-value functions for decentralized POMDPs[J]. Journal of Artificial Intelligence Research, 2008, 32(1): 289-353. [21] YANG Y, LUO R, LI M N, et al. Mean field multi-agent reinforcement learning[C]//Proceedings of the 35th International Conference on Machine Learning. New York: ACM, 2018: 5571-5580. [22] SUBRAMANIAN S G, TAYLOR M E, CROWLEY M, et al. Partially observable mean field reinforcement learning[C]//Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. New York: ACM, 2021: 537-545. [23] WANG W C, HAN J Q, YANG Z R, et al. Global convergence of policy gradient for linear-quadratic mean-field control/game in continuous time[C]//Proceedings of the 38th International Conference on Machine Learning. New York: ACM, 2021: 10772-10782. [24] CHEN M S, LI Y, WANG E H, et al. Pessimism meets invariance: provably efficient offline mean-field multi-agent RL[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Cambridge: MIT, 2021: 17913-17926. [25] GANAPATHI SUBRAMANIAN S, POUPART P, TAYLOR M E, et al. Multi type mean field reinforcement learning[C]//Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. New York: ACM, 2020: 411-419. [26] SUBRAMANIAN S G, TAYLOR M E, CROWLEY M, et al. Decentralized mean field games[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 9439-9447. [27] GU H T, GUO X, WEI X L, et al. Mean-field controls with Q-learning for cooperative MARL: convergence and complexity analysis[J]. SIAM Journal on Mathematics of Data Science, 2021, 3(4): 1168-1196. [28] MONDAL W U, AGGARWAL V, UKKUSURI S V. Mean-field approximation of cooperative constrained multi-agent reinforcement learning (CMARL)[J]. Journal of Machine Learning Research, 2024, 25: 12581-12613. [29] Context-aware Bayesian network actor-critic methods for cooperative multi-agent reinforcement learning[C]//Proceedings of the 40th International Conference on Machine Learning. New York: ACM, 2023: 5327-5350. [30] JIANG J C, DUN C, HUANG T J, et al. Graph convolutional reinforcement learning[C]//Proceeding of the 8th International Conference on Learning Representations. Rwanda: OpenReview.net, 2020: 1-13. [31] VASWANI A, SHAZEER N, PARMAR N, ET AL. Attention is all you need[C]//Proceeding of the 31st Conference on Neural Information Processing Systems. Cambridge: MIT, 2017: 1-11. [32] LIU X Y, TAN Y. Attentive relational state representation in decentralized multiagent reinforcement learning[J]. IEEE Transactions on Cybernetics, 2022, 52(1): 252-264. [33] MALYSHEVA A, KUDENKO D, SHPILMAN A. MAGNet: multi-agent graph network for deep multi-agent reinforcement learning[C]//Proceedings of the 2019 XVI International Symposium. Piscataway: IEEE, 2019: 171-176. [34] LIU Y, WANG W X, HU Y J, et al. Multi-agent game abstraction via graph attention neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 7211-7218. [35] LI Y, WANG X Z, WANG J S, et al. Cooperative multi-agent reinforcement learning with hierarchical relation graph under partial observability[C]//Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence. Piscataway: IEEE, 2020: 1-8. [36] LIU Y T, DOU Y, LI Y, et al. Temporal dynamic weighted graph convolution for multi-agent reinforcement learning[C]//Proceedings of the Annual Meeting of the Cognitive Science Society. Berkeley: University of California, 2022: 743-749. [37] DING S F, DU W, DING L, et al. Multiagent reinforcement learning with graphical mutual information maximization[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023: 1-10. DOI: 10.1109/TNNLS.2023.3243557. [38] ZHANG C X, SONG D J, HUANG C, et al. Heterogeneous graph neural network[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2019: 793-803. [39] DU W, DING S F, ZHANG C L, et al. Multiagent reinforcement learning with heterogeneous graph attention network[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 6851-6860. [40] LIU Y T, LI Y, XU X H, et al. ROGC: role-oriented graph convolution based multi-agent reinforcement learning[C]//Proceedings of the 2022 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2022: 1-6. [41] PARK J, YOON S, KIM Y D. A hetero-relation transformer network for multiagent reinforcement learning[J]. IEEE Transactions on Games, 2025, 17(1): 138-147. [42] ALBRECHT S V, STONE P. Autonomous agents modelling other agents: a comprehensive survey and open problems[J]. Artificial Intelligence, 2018, 258: 66-95. [43] MAO H Y, ZHANG Z C, XIAO Z, et al. Modelling the dynamic joint policy of teammates with attention multi-agent DDPG[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multi-Agent Systems. New York: ACM, 2019: 1108-1116. [44] WEN Y, YANG Y D, LUO R, et al. Probabilistic recursive reasoning for multi-agent reinforcement learning[C]//Proceedings of the 6th International Conference on Learning Representations. Rwanda: OpenReview.net, 2019: 1-19. [45] MA X B, ISELE D, GUPTA J K, et al. Recursive reasoning graph for multi-agent reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 7664-7671. [46] COHEN S, AGMON N. Optimizing multi-agent coordination via hierarchical graph probabilistic recursive reasoning[C]//Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2022: 290-299. [47] YUAN L, WANG J H, ZHANG F X, et al. Multi-agent incentive communication via decentralized teammate modeling[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 9466-9474. [48] JIANG R, ZHANG X T, LIU Y S, et al. Multi-agent cooperative strategy with explicit teammate modeling and targeted informative communication[J]. Neurocomputing, 2024, 586: 127638. [49] NGUYEN D, LE H, DO K, et al. Social motivation for modelling other agents under partial observability in decentralised training[C]//Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. New York: ACM, 2023: 4082-4090. [50] RASHID T, SAMVELYAN M, SCHROEDER C, et al. QMIX: monotonic value function factorization for deep multi-agent reinforcement learning[C]//Proceedings of the 35th International Conference on Machine Learning. New York: ACM, 2018: 4295-4304. [51] RASHID T, FARQUHAR G, PENG B, et al. Weighted qmix: expanding monotonic value function factorization for deep multi-agent reinforcement learning[C]//Advances in Neural Information Processing Systems. Cambridge: MIT, 2020: 10199-10210. [52] XIONG L Q, CAO L, CHEN X L, et al. Character-based value factorization for MADRL[J]. The Computer Journal, 2023, 66(11): 2782-2793. [53] YAO X H, WEN C, WANG Y H, et al. SMIX(λ): enhancing centralized value functions for cooperative multiagent reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(1): 52-63. [54] RAO J J, WANG C, LIU M, et al. ISFORS-MIX: multi-agent reinforcement learning with Importance-Sampling-Free Off-policy learning and Regularized-Softmax Mixing network[J]. Knowledge-Based Systems, 2025, 309: 112881. [55] YANG Y D, HAO J Y, LIAO B, et al. Qatten: a general framework for cooperative multiagent reinforcement learning[J]. arXiv:2002.03939, 2020. [56] SON K, KIM D, KANG W J, et al. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning. New York: ACM, 2019: 5887-5896 [57] SON K, AHN S, REYES R D, Et al. QTRAN++: improved value transformation for cooperative multi-agent reinforcement learning[J]. arXiv:2006.12010, 2020. [58] WANG J H, REN Z Z, LIU T R, et al. QPLEX: duplex dueling multi-agent q-learning[C]//Proceedings of the International Conference on Learning Representations. Rwanda: OpenReview.net, 2021: 1-27. [59] ZHOU H H, LAN T, AGGARWAL V. Value functions factorization with latent state information sharing in decentralized multi-agent policy gradients[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2023, 7(5): 1351-1361. [60] ZHAO J, YANG M Y, HU X H, et al. DQMIX: a distributional perspective on multi-agent reinforcement learning[J]. arXiv:2202.10134, 2022. [61] SUN W F, LEE C K, LEE C Y. A distributional perspective on value function factorization methods for multi-agent reinforcement learning[C]//Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. New York: ACM, 2021: 1671-1673. [62] XU Z W, LI D P, BAI Y P, et al. MMD-MIX: value function factorisation with maximum mean discrepancy for cooperative multi-agent reinforcement learning[C]//Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-7. [63] SUN W F, LEE C K, SEE S, et al. A unified framework for factorizing distributional value functions for multi-agent reinforcement learning[J]. Journal of Machine Learning Research, 2023, 24: 1-32. [64] WOLPERT D H, TUMER K. Optimal payoff functions for members of collectives[J]. Advances in Complex Systems, 2001, 4(2): 265-279. [65] TUMER K, AGOGINO A. Distributed agent-based air traffic flow management[C]//Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2007: 1-8. [66] CASTELLINI J, DEVLIN S, OLIEHOEK F A, et al. Difference rewards policy gradients[C]//Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. New York: ACM, 2021: 1475-1477. [67] FREED B, KAPOOR A, ABRAHAM I, et al. Learning cooperative multi-agent policies with partial reward decoupling[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 890-897. [68] CHHABLANI C, KASH I A. An analysis of connections between regret minimization and actor critic methods in cooperative settings[C]//Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2023: 2836-2838. [69] QU S C, GUO R Q, CAO Z J, et al. An effective training method for counterfactual multi-agent policy network based on differential evolution algorithm[J]. Applied Sciences, 2024, 14(18): 8383. [70] SHAO K, ZHU Y H, TANG Z T, et al. Cooperative multi-agent deep reinforcement learning with counterfactual reward[C]//Proceedings of the 2020 International Joint Conference on Neural Networks. Piscataway: IEEE, 2020: 1-8. [71] SINGH A J, KUMAR A, LAU H C. Approximate difference rewards for scalable multigent reinforcement learning[C]//Proceedings of the 20th International Conference on Autonomous Agents and Multi Agent Systems. New York: ACM, 2021: 1655-1657. [72] WANG J H, ZHANG Y, KIM T K, et al. Shapley Q-value: a local reward approach to solve global reward games[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 7285-7292. [73] LI J H, KUANG K, WANG B X, et al. Shapley counterfactual credits for multi-agent reinforcement learning[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York: ACM, 2021: 934-942. [74] LIU Y T, DOU Y, SHEN S Q, et al. Global-localized agent graph convolution for multi-agent reinforcement learning[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 3480-3484. [75] CHEN H, YANG G K, ZHANG J G, et al. RACA: relation-aware credit assignment for ad-hoc cooperation in multi-agent deep reinforcement learning[C]//Proceedings of the 2022 International Joint Conference on Neural Networks. Piscataway: IEEE, 2022: 1-8. [76] LIU Z Y, WAN L P, SUI X, et al. Deep hierarchical communication graph in multi-agent reinforcement learning[C]//Proceedings of the 32nd International Joint Conference on Artificial Intelligence. New York: ACM, 2023: 208-216. [77] SHEN S Q, LIU J, QIU M W, et al. Qrelation: an agent relation-based approach for multi-agent reinforcement learning value function factorization[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 4108-4112. [78] B?HMER W, KURIN V, WHITESON S. Deep coordination graphs[C]//Proceedings of the International Conference on Machine Learning. New York: ACM, 2020: 980-991. [79] LI S, GUPTA J K, MORALES P, et al. Deep implicit coordination graphs for multi-agent reinforcement learning[C]//Proceedings of the 20th International Conference on Autonomous Agents and Multi Agent Systems. New York: ACM, 2021: 764-772. [80] YANG Q L, DONG W J, REN ZZ, et al. Self-organized polynomial-time coordination graphs[C]//Proceedings of the 39th International Conference on Machine Learning. New York: ACM, 2022: 24963-24979. [81] LIU S Q, LIU W W, CHEN W Z, et al. Learning multi-agent cooperation via considering actions of teammates[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(8): 11553-11564. [82] IQBAL S, COSTALES R, SHA F. ALMA: hierarchical learning for composite multi-agent tasks[C]//Proceedings of the 39th International Conference on Neural Information Processing Systems. Cambridge: MIT, 2022: 7155-7166. [83] LIU Z C, LIU Z C, ZHU Y Y, et al. NA2Q: neural attention additive model for interpretable multi-agent Q-learning[C]//Proceedings of the International Conference on Machine Learning, 2023: 22539-22558. [84] ZHANG T H, YE Q W, BIAN J, et al. MFVFD: a multi-agent Q-learning approach to cooperative and non-cooperative tasks[C]//Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2021: 500-506. [85] DING S F, DU W, DING L, et al. Multi-agent dueling Q-learning with mean field and value decomposition[J]. Pattern Recognition, 2023, 139: 109436. [86] WU B, YANG X Y, SUN C X, et al. Learning effective value function factorization via attentional communication[C]//Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). New York: ACM, 2020: 629-634. [87] ZHANG Y X, MA H M, WANG Y. AVD-Net: attention value decomposition network for deep multi-agent reinforcement learning[C]//Proceedings of the 2020 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021: 7810-7816. [88] LI W, LIU W Y, SHAO S T, et al. Attention-based intrinsic reward mixing network for credit assignment in multiagent reinforcement learning[J]. IEEE Transactions on Games, 2024, 16(2): 270-281. [89] LIU H B. Cooperative multi-agent game based on reinforcement learning[J]. High-Confidence Computing, 2024, 4(1): 100205. [90] LI C, DONG S K, YANG S D, et al. Coordinating multi-agent reinforcement learning via dual collaborative constraints[J]. Neural Networks, 2025, 182: 106858. [91] YUAN L, WANG C H, WANG J H, et al. Multi-agent concentrative coordination with decentralized task representation[C]//Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2022: 599-605. [92] 王尔申, 刘帆, 宏晨, 等. 基于MASAC的无人机集群对抗博弈方法[J]. 中国科学 (信息科学), 2022, 52(12): 2254-2269. WANG E S, LIU F, HONG C, et al. Game method of UAV cluster confrontation based on MASAC[J]. Science in China (Information Sciences), 2022, 52(12): 2254-2269. [93] 王尔申, 陈纪浩, 宏晨, 等. 引入反事实基线的无人机集群对抗博弈方法[J]. 中国科学: 信息科学, 2024, 54(7): 1775-1792. WANG E S, CHEN J H, HONG C, et al. UAV swarm adversarial game method with a counterfactual baseline[J]. Scientia Sinica (Informationis), 2024, 54(7): 1775-1792. [94] ZHOU Y M, YANG F, ZHANG C Y, et al. Cooperative decision-making algorithm with efficient convergence for UCAV formation in beyond-visual-range air combat based on multi-agent reinforcement learning[J]. Chinese Journal of Aeronautics, 2024, 37(8): 311-328. [95] XU X J, WANG Y F, GUO X, et al. Multi-UAV air combat cooperative game based on virtual opponent and value attention decomposition policy gradient[J]. Expert Systems with Applications, 2025, 267: 126069. [96] ZHANG R L, ZONG Q, ZHANG X Y, et al. Game of drones: multi-UAV pursuit-evasion game with online motion planning by deep reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 7900-7909. [97] KOUZEGHAR M, SONG Y, MEGHJANI M, et al. Multi-target pursuit by a decentralized heterogeneous UAV swarm using deep multi-agent reinforcement learning[C]//Proceedings of the 2023 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2023: 3289-3295. [98] ZHANG Z K, ZONG Q, LIU D, et al. A pursuit-evasion game on a real-city virtual simulation platform based on multi-agent reinforcement learning[C]//Proceedings of the 2023 42nd Chinese Control Conference. Piscataway: IEEE, 2023: 6018-6023. [99] 夏家伟, 朱旭芳, 张建强, 等. 基于多智能体强化学习的无人艇协同围捕方法[J]. 控制与决策, 2023, 38(5): 1438-1447. XIA J W, ZHU X F, ZHANG J Q, et al. Research on cooperative hunting method of unmanned surface vehicle based on multi-agent reinforcement learning[J]. Control and Decision, 2023, 38(5): 1438-1447. [100] LI K, WANG Q H, GONG M Y, et al. Multi-robot cooperative navigation method based on multi-agent reinforcement learning in sparse reward tasks[C]//Proceedings of the 2023 4th International Symposium on Computer Engineering and Intelligent Communications. Piscataway: IEEE, 2023: 257-261. [101] ZHOU X Y, PIAO S H, CHI W Z, et al. HeR-DRL: heterogeneous relational deep reinforcement learning for decentralized multi-robot crowd navigation[J]. arXiv:2403. 10083, 2024. [102] ZHOU W H, LI J, LIU Z H, et al. Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning[J]. Chinese Journal of Aeronautics, 2022, 35(7): 100-112. [103] XUE Y T, CHEN W S. Multi-agent deep reinforcement learning for UAVs navigation in unknown complex environment[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(1): 2290-2303. [104] ZHAO B C, HUO M Y, LI Z, et al. Graph-based multi-agent reinforcement learning for collaborative search and tracking of multiple UAVs[J]. Chinese Journal of Aeronautics, 2025, 38(3): 103214. [105] YANG L, ZHENG J, ZHANG B X. An MARL-based task scheduling algorithm for cooperative computation in multi-UAV-assisted MEC systems[C]//Proceedings of the 2023 International Conference on Future Communications and Networks. Piscataway: IEEE, 2023: 1-6. [106] 纪苗苗, 吴志彬. 考虑工人路径的多智能体强化学习空间众包任务分配方法[J]. 控制与决策, 2024, 39(1): 319-326. JI M M, WU Z B. A multi-agent reinforcement learning algorithm for spatial crowdsourcing task assignments considering workers’ path[J]. Control and Decision, 2024, 39(1): 319-326. [107] LI Y Y, FENG L, YANG Y, et al. GAN-powered heterogeneous multi-agent reinforcement learning for UAV-assisted task offloading[J]. Ad Hoc Networks, 2024, 153: 103341. [108] GAO Z, FU J M, JING Z M, et al. MOIPC-MAAC: communication-assisted multiobjective MARL for trajectory planning and task offloading in multi-UAV-assisted MEC[J]. IEEE Internet of Things Journal, 2024, 11(10): 18483-18502. [109] CUI J J, LIU Y W, NALLANATHAN A. Multi-agent reinforcement learning-based resource allocation for UAV networks[J]. IEEE Transactions on Wireless Communications, 2020, 19(2): 729-743. [110] LI T Y, LENG S P, WANG Z H, et al. Intelligent resource allocation schemes for UAV-swarm-based cooperative sensing[J]. IEEE Internet of Things Journal, 2022, 9(21): 21570-21582. [111] MALEKI M R, MILI M R, JAVAN M R, et al. Multi-agent reinforcement learning trajectory design and two-stage resource management in CoMP UAV VLC networks[J]. IEEE Transactions on Communications, 2022, 70(11): 7464-7476. [112] FENG Z K, WU D, HUANG M X, et al. Graph-attention-based reinforcement learning for trajectory design and resource assignment in multi-UAV-assisted communication[J]. IEEE Internet of Things Journal, 2024, 11(16): 27421-27434. [113] AGILA R, ESTRADA R, ROHODEN K. Resource allocation with graph neural networks-multi agent reinforcement learning for 6G HetNets[J]. Procedia Computer Science, 2024, 241: 24-31. [114] CHU T S, WANG J, CODECà L, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(3): 1086-1095. [115] WANG X Q, KE L J, QIAO Z M, et al. Large-scale traffic signal control using a novel multiagent reinforcement learning[J]. IEEE Transactions on Cybernetics, 2021, 51(1): 174-187. [116] JIANG S, HUANG Y F, JAFARI M, et al. A distributed multi-agent reinforcement learning with graph decomposition approach for large-scale adaptive traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 14689-14701. [117] SONG X, ZHOU B, MA D F. Cooperative traffic signal control through a counterfactual multi-agent deep actor critic approach[J]. Transportation Research Part C: Emerging Technologies, 2024, 160: 104528. [118] 杜同春, 王波, 程浩然, 等. 聚类与信息共享的多智能体深度强化学习协同控制交通灯[J]. 电子与信息学报, 2024, 46(2): 538-545. DU T C, WANG B, CHENG H R, et al. Multi-agent deep reinforcement learning with clustering and information sharing for traffic light cooperative control[J]. Journal of Electronics & Information Technology, 2024, 46(2): 538-545. [119] WANG E S, DING R, YANG Z X, et al. Joint charging and relocation recommendation for E-taxi drivers via multi-agent mean field hierarchical reinforcement learning[J]. IEEE Transactions on Mobile Computing, 2022, 21(4): 1274-1290. [120] WANG L, LIU S X, WANG P F, et al. QMIX-based multi-agent reinforcement learning for electric vehicle-facilitated peak shaving[C]//Proceedings of the 2023 IEEE Global Communications Conference. New York: IEEE Communications Society, 2023: 1693-1698. [121] QIU D W, WANG Y, ZHANG T Q, et al. Hybrid multiagent reinforcement learning for electric vehicle resilience control towards a low-carbon transition[J]. IEEE Transactions on Industrial Informatics, 2022, 18(11): 8258-8269. [122] WANG Y, QIU D W, HE Y L, et al. Multi-agent reinforcement learning for electric vehicle decarbonized routing and scheduling[J]. Energy, 2023, 284: 129335. |
| [1] | 马祖鑫, 崔允贺, 秦永彬, 申国伟, 郭春, 陈意, 钱清. 融合深度强化学习的卷积神经网络联合压缩方法[J]. 计算机工程与应用, 2025, 61(6): 210-219. |
| [2] | 刘延飞, 李超, 王忠, 王杰铃. 多智能体深度强化学习及可扩展性研究进展[J]. 计算机工程与应用, 2025, 61(4): 1-24. |
| [3] | 李彦, 万征. 深度强化学习在边缘视频传输优化中的应用综述[J]. 计算机工程与应用, 2025, 61(4): 43-58. |
| [4] | 顾金浩, 况立群, 韩慧妍, 曹亚明, 焦世超. 动态环境下共融机器人深度强化学习导航算法[J]. 计算机工程与应用, 2025, 61(4): 90-98. |
| [5] | 顾同成, 徐东伟, 孙成巨. 无人驾驶深度强化学习决策模型性能评测方法综述[J]. 计算机工程与应用, 2025, 61(19): 12-42. |
| [6] | 杨伟达, 吴志周, 梁韵逸. 基于循环图注意力强化学习的交叉口多车协同控制方法[J]. 计算机工程与应用, 2025, 61(19): 282-291. |
| [7] | 张盛, 沈捷, 曹恺, 戴辉帅, 李涛. 基于改进DDPG的机械臂6D抓取方法研究[J]. 计算机工程与应用, 2025, 61(18): 317-325. |
| [8] | 张长勇, 姚凯超, 张宇浩. 求解在线三维装箱问题的启发式深度强化学习算法[J]. 计算机工程与应用, 2025, 61(17): 329-336. |
| [9] | 李成健, 宋姝谊, 粟宇, 陈智斌. 深度强化学习求解多目标旅行商问题的研究综述[J]. 计算机工程与应用, 2025, 61(12): 28-44. |
| [10] | 杨蓝, 毕利, 杨众. 结合图神经网络的DDQN算法的动态车间调度问题研究[J]. 计算机工程与应用, 2025, 61(12): 344-351. |
| [11] | 魏琦, 李艳武, 谢辉, 牛晓伟. 基于图神经网络的柔性作业车间两阶段调度研究[J]. 计算机工程与应用, 2025, 61(11): 342-350. |
| [12] | 高宇宁, 王安成, 赵华凯, 罗豪龙, 杨子迪, 李建胜. 基于深度强化学习的视觉导航方法综述[J]. 计算机工程与应用, 2025, 61(10): 66-78. |
| [13] | 张泽崴, 张建勋, 邹航, 李林, 南海. 多智能体深度强化学习的图像特征分类方法[J]. 计算机工程与应用, 2024, 60(7): 222-228. |
| [14] | 李鑫, 沈捷, 曹恺, 李涛. 深度强化学习的机械臂密集场景多物体抓取方法[J]. 计算机工程与应用, 2024, 60(23): 325-332. |
| [15] | 黄泽丰, 李涛. RISE-D3QN驱动的多无人机数据采集路径规划[J]. 计算机工程与应用, 2024, 60(20): 328-338. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||