多智能体路径规划综述

doi:10.3778/j.issn.1002-8331.2203-0467

摘要/Abstract

摘要： 多智能体路径规划（multi-agent path finding，MAPF）是为多个智能体规划路径的问题，关键约束是多个智能体同时沿着规划路径行进而不会发生冲突。MAPF在物流、军事、安防等领域有着大量应用。对国内外关于MAPF的主要研究成果进行系统整理和分类，按照规划方式不同，MAPF算法分为集中式规划算法和分布式执行算法。集中式规划算法是最经典和最常用的MAPF算法，主要分为基于[A*]搜索、基于冲突搜索、基于代价增长树和基于规约四种算法。分布式执行算法是人工智能领域兴起的基于强化学习的MAPF算法，按照改进技术不同，分布式执行算法分为专家演示型、改进通信型和任务分解型三种算法。基于上述分类，比较MAPF各种算法的特点和适用性，分析现有算法的优点和不足，指出现有算法面临的挑战并对未来工作进行了展望。

关键词: 多智能体路径规划, 人工智能, 搜索, 分布式, 强化学习

Abstract: The multi-agent path finding（MAPF） problem is the fundamental problem of planning paths for multiple agents, where the key constraint is that the agents will be able to follow these paths concurrently without colliding with each other. MAPF is widely used in logistics, military, security and other fields. MAPF algorithm can be divided into the centralized planning algorithm and the distributed execution algorithm when the main research results of MAPF at home and abroad are systematically sorted and classified according to different planning methods. The centralized programming algorithm is not only the most classical but also the most commonly used MAPF algorithm. It is mainly divided into four algorithms based on [A*] search, conflict search, cost growth tree and protocol. The other part of MAPF which is the distributed execution algorithm is based on reinforcement learning. According to different improved techniques, the distributed execution algorithm can be divided into three types：the expert demonstration, the improved communication and the task decomposition. The challenges of existing algorithms are pointed out and the future work is forecasted based on the above classification by comparing the characteristics and applicability of MAPF algorithms and analyzing the advantages and disadvantages of existing algorithms.

Key words: multi-agent path finding, artificial intelligence, search, distributed, reinforcement learning

刘志飞, 曹雷, 赖俊, 陈希亮, 陈英. 多智能体路径规划综述[J]. 计算机工程与应用, 2022, 58(20): 43-64.

LIU Zhifei, CAO Lei, LAI Jun, CHEN Xiliang, CHEN Ying. Overview of Multi-Agent Path Finding[J]. Computer Engineering and Applications, 2022, 58(20): 43-64.

参考文献

[1] YAKOVLEV K，ANDEYCHUK A.Any-angle pathfinding for multiple agents based on SIPP algorithm[C]//The Twenty-Seventh International Conference on Automated Planning and Scheduling（ICAPS 2017），2017.
[2] LI J，TINKA A，KIESEL S，et al.Lifelong multi-agent path finding in large-scale warehouses[C]//Proceedings of AAMAS，2020：1898-1900.
[3] MA H，YANG J，COHEN L，et al.Feasibility study：moving non-homogeneous teams in congested video game environments[C]//Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment，2017：270-272.
[4] MOHANTY S，NYGREN E，LAURENT F，et al.Flatland-RL：multi-agent reinforcement learning on trains[J].arXiv：2012.
05893，2020.
[5] CHOUDHURY S，SOLOVERY K，KOCHENDERFER M，et al.Coordinated multi-agent pathfinding for drones and trucks over road networks[J].arXiv：2110.08802，2021.
[6] CARTUCHO J，VENTURA R，VELOSO M.Robust object recognition through symbiotic deep learning in mobile robots[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2018：2336-2341.
[7] FELNER A，STERN R，SHIMONY S E，et al.Search-based optimal solvers for the multi-agent pathfinding problem：summary and challenges[C]//International Symposium on Combinatorial Search，2017.
[8] SURYNEK P，FELNER A，STERN R，et al.An empirical comparison of the hardness of multi-agent path finding under the makespan and the sum of costs objectives[C]//Symposium on Combinatorial Search，2016.
[9] BARTáK R，?VANCARA J，?KOPKOVá V，et al.Multi-agent path finding on real robots：first experience with ozobots[C]//Ibero-American Conference on Artificial Intelligence.Cham：Springer，2018：290-301.
[10] COHEN L，WAGNER G，CHAN D，et al.Rapid randomized restarts for multi-agent path finding solvers[C]//Eleventh Annual Symposium on Combinatorial Search，2018.
[11] MA H，HARABOR D，STUCKEY P J，et al.Searching with consistent prioritization for multi-agent path finding[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2019，33（1）：7643-7650.
[12] STERN R，STURTEVANT N R，FELNER A，et al.Multi-agent pathfinding：definitions，variants，and benchmarks[C]//Twelfth Annual Symposium on Combinatorial Search，2019.
[13] STERN R.Multi-agent path finding-an overview[M]//Artificial intelligence.Berlin：Springer-Verlag，2019：96-115.
[14] 刘庆周，吴锋.多智能体路径规划研究进展[J].计算机工程，2020，46（4）：1-10.
LIU Q Z，WU F.Research progress of multi-agent path planning[J].Computer Engineering，2020，46（4）：1-10.
[15] FERGUSON D，LIKHACHEV M，STENTZ A.A guide to heuristic-based path planning[C]//Proceedings of the International Workshop on Planning Under Uncertainty for Autonomous Systems，International Conference on Automated Planning and Scheduling（ICAPS），2005：9-18.
[16] STANDLEY T.Finding optimal solutions to cooperative pathfinding problems[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2010：173-178.
[17] WAGNER G，CHOSET H.Subdimensional expansion for multirobot path planning[J].Artificial Intelligence，2015，219（2）：1-24.
[18] VIRMANI L，REN Z，RATHINAM S，et al.Subdimensional expansion using attention-based learning for multi-agent path finding[J].arXiv：2109.14695，2021.
[19] REN Z，RATHINAM S，LIKHACHEV M，et al.Enhanced multi-objective A* using balanced binary search trees[J].arXiv：2202.08992，2022.
[20] REN Z，RATHINAM S，CHOSET H.Loosely synchronized search for multi-agent path finding with asynchronous actions[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2021：9714-9719.
[21] GOLDENBERG M，FELNER A，STURTEVANT N，et al.Optimal-generation variants of EPEA[C]//International Symposium on Combinatorial Search，2013.
[22] GOLDENBERG M，FELNER A，STERN R，et al.Enhanced partial expansion A[J].Journal of Artificial Intelligence Research，2014，50（2）：141-187.
[23] SHARON G，STERN R，FELNER A，et al.Conflict-based search for optimal multi-agent pathfinding[J].Artificial Intelligence，2015，219（2）：40-66.
[24] BOYARSKI E，FELNER A，STERN R，et al.ICBS：improved conflict-based search algorithm for multi-agent pathfinding[C]//Twenty-Fourth International Joint Conference on Artificial Intelligence，2015.
[25] GANGE G，HARABOR D，STUCKEY P J.Lazy CBS：implicit conflict-based search using lazy clause generation[C]//Proceedings of the International Conference on Automated Planning and Scheduling，2019：155-162.
[26] BOYARSKI E，FELNER A，LE BODIC P，et al.f-Aware conflict prioritization & improved heuristics for conflict-based search[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2021：12241-12248.
[27] LI J，FELNER A，BOYARSKI E，et al.Improved heuristics for multi-agent path finding with conflict-based search[C]//International Joint Conference on Artificial Intelligence（IJCAI），2019：442-449.
[28] BARER M，SHARON G，STERN R，et al.Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem[C]//Seventh Annual Symposium on Combinatorial Search，2014.
[29] CHAN S H，LI J，GANGE G，et al.ECBS with flex distribution for bounded-suboptimal multi-agent path finding[C]//Proceedings of the International Symposium on Combinatorial Search，2021：159-161.
[30] RAHMAN M，ALAM M A，ISLAM M M，et al.An adaptive agent-specific sub-optimal bounding approach for multi-agent path finding[J].IEEE Access，2022，10：22226-22237.
[31] LI J，RUML W，KOENING S.EECBS：a bounded-suboptimal search for multi-agent path finding[C]//Proceedings of the AAAI Conference on Artificial Intelligence（AAAI），2021：12353-12362.
[32] HUANG T，DILKINA B，KOENING S.Learning to resolve conflicts for multi-agent path finding with conflict-based search[C]//AAAI Conference on Artificial Intelligence，2020.
[33] ANDREYCHUK A，YAKOVLEV K，SURYNEK P，et al.Multi-agent pathfinding with continuous time[J].Artificial Intelligence，2022：103662.
[34] BOYARSKI E，FELNER A，HARABOR D，et al.Iterative-deepening conflict-based search[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence，2021：4084-4090.
[35] SHAARON G，STERN R，GOLDENBERG M，et al.The increasing cost tree search for optimal multi-agent pathfinding[J].Artificial Intelligence，2013，195：470-495.
[36] WALKER T T，STURTEVANT N R，FELNER A.Extended increasing cost tree search for non-unit cost domains[C]//International Joint Conference on Artificial Intelligence（IJCAI），2018：534-540.
[37] WALKER T T，STURTEVANT N R，FELNER A，et al.Conflict-based increasing cost search[C]//Proceedings of the International Conference on Automated Planning and Scheduling，2021：385-395.
[38] YU J，LAVAALLE S M.Structure and intractability of optimal multi-robot path planning on graphs[C]//Twenty-Seventh AAAI Conference on Artificial Intelligence，2013.
[39] SURYNEK P.Makespan optimal solving of cooperative path-finding via reductions to propositional satisfiability[J].arXiv：1610.05452，2016.
[40] BARTAK R，ZHOU N F，STERN R，et al.Modeling and solving the multi-agent pathfinding problem in picat[C]//2017 IEEE 29th International Conference on Tools with Artificial Intelligence（ICTAI），2017：959-966.
[41] ZHOU N F，KJELLERSTRAND H，FRUHMAN J.Constraint solving and planning with picat[M].[S.l.]：Springer International Publishing，2015.
[42] ERDEM E，KISA D G，OZTOK U，et al.A general formal framework for pathfinding problems with multiple agents[C]//Twenty-Seventh AAAI Conference on Artificial Intelligence，2013.
[43] SURYNEK P.Multi-agent path finding with continuous time viewed through satisfiability modulo theories（SMT）[J].arXiv：1903.09820，2019.
[44] YU J，LAVALLE S M.Multi-agent path planning and network flow[M]//Algorithmic foundations of robotics X.Berlin，Heidelberg：Springer，2013：157-173.
[45] LI J，CHEN Z，HARABOR D，et al.Anytime multi-agent path finding via large neighborhood search[C]//International Joint Conference on Artificial Intelligence（IJCAI），2021.
[46] HUANG T，LI J，KOENING S，et al.Anytime multi-agent path finding via machine learning-guided large neighborhood search[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2022.
[47] LI J，CHEN Z，HARABOR D，et al.MAPF-LNS2：fast repairing for multi-agent path finding via large neighborhood search[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2022.
[48] OKUMURA K，YONETANI R，NISHIMURA M，et al.CTRMs：learning to construct cooperative timed roadmaps for multi-agent path planning in continuous spaces[J].arXiv：2201.09467，2022.
[49] LIU Z，WANG H，WEI H，et al.Prediction，planning，and coordination of thousand-warehousing-robot networks with motion and communication uncertainties[J].IEEE Transactions on Automation Science and Engineering，2020，18（4）：1705-1717.
[50] ZHANG H，YAO M，LIU Z，et al.A hierarchical approach to multi-agent path finding[C]//Proceedings of the International Symposium on Combinatorial Search，2021：209-211.
[51] NEKVINDA M，BARTAK R.Contingent planning for robust multi-agent path finding[C]//2021 IEEE 33rd International Conference on Tools with Artificial Intelligence（ICTAI），2021：487-492.
[52] GRESHLER N，GORDON O，SALZMAN O，et al.Cooperative multi-agent path finding：beyond path planning and collision avoidance[C]//2021 International Symposium on Multi-Robot and Multi-Agent Systems（MRS），2021：20-28.
[53] FUJITANI Y，YAMAUCHI T，MIYASHITA Y，et al.Deadlock-free method for multi-agent pickup and delivery problem using priority inheritance with temporary priority[J].arXiv：2205.12504，2022.
[54] ATZON D，ZAX Y，KIVITY E，et al.Generalizing multi-agent path finding for heterogeneous agents[C]//Thirteenth Annual Symposium on Combinatorial Search，2020.
[55] HERNANDEZ-LEAL P，KARTAL B，TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems，2019，33（6）：750-797.
[56] OTHMAN W，SHILOV N.Deep reinforcement learning for path planning by cooperative robots：existing approaches and challenges[C]//2021 28th Conference of Open Innovations Association（FRUCT），2021：349-357.
[57] SILVER D，HHUANG A，MADDISON C J，et al.Mastering the game of Go with deep neural networks and tree search[J].Nature，2016，529（7587）：484-489.
[58] SILVER D，SCHIRITTWIESER J，SIMONYAN K，et al.Mastering the game of go without human knowledge[J].Nature，2017，550（7676）：354-359.
[59] MORAVCIK M，SCHMID M，BURCH N，et al.Deepstack：expert-level artificial intelligence in heads-up no-limit poker[J].Science，2017，356（6337）：508-513.
[60] BROWN N，SANDHOLM T.Superhuman AI for heads-up no-limit poker：libratus beats top professionals[J].Science，2018，359（6374）：418-424.
[61] BERNER C，BROCKMAN G，CHAN B，et al.Dota 2 with large scale deep reinforcement learning[J].arXiv：1912.
06680，2019.
[62] VINYALS O，EWALDS T，BARTUNOV S，et al.Starcraft ii：a new challenge for reinforcement learning[J].arXiv：1708.
04782，2017.
[63] FILAR J，VRIEZE K.Competitive Markov decision processes[M].[S.l.]：Springer Science & Business Media，2012.
[64] JANG B，KIM M，HARERIMANA G，et al.Q-learning algorithms：a comprehensive classification and applications[J].IEEE Access，2019，7：133653-133667.
[65] O’DONOGHUE B，OSBAND I，MUNOS R，et al.The uncertainty bellman equation and exploration[C]//International Conference on Machine Learning，2018：3836-3845.
[66] THOMAS P S，BRUNSKILL E.Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines[J].arXiv：1706.06643，2017.
[67] WIERING M A，VAN OTTERLO M.Reinforcement learning[J].Adaptation，Learning，and Optimization，2012，12（3）：729-734.
[68] BHATNAGAR S，SUTTON R S，GHAVAMZADEH M，et al.Natural actor-critic algorithms[J].Automatica，2009，45（11）：2471-2482.
[69] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Playing atari with deep reinforcement learning[J].arXiv：1312.
5602，2013.
[70] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Human-level control through deep reinforcement learning[J].Nature，2015，518（7）：529-533.
[71] VAN HASSELT H，GUEZ A，SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2016.
[72] WANG Z，SCHAUL T，HESSEL M，et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning，2016：1995-2003.
[73] SCHAUL T，QUAN J，ANTONOGLOU I，et al.Prioritized experience replay[J].arXiv：1511.05952，2015.
[74] FORTUNATO M，AZAR M G，PIOT B，et al.Noisy networks for exploration[J].arXiv：1706.10295，2017.
[75] HESSEL M，MODAYIL J，VAN-HASSELT H，et al.Rainbow：combining improvements in deep reinforcement learning[C]//Thirty-Second AAAI Conference on Artificial Intelligence，2018.
[76] LILLICRAP T P，HUNT J J，PRITZEL A，et al.Continuous control with deep reinforcement learning[J].arXiv：1509.02971，2015.
[77] SCHULMAN J，WOLSKI F，DHARIWAL P，et al.Proximal policy optimization algorithms[J].arXiv：1707.06347，2017.
[78] SCHULMAN J，LEVINE S，ABBEEL P，et al.Trust region policy optimization[C]//International Conference on Machine Learning，2015：1889-1897.
[79] HEESS N，TB D，SRIRAM S，et al.Emergence of locomotion behaviours in rich environments[J].arXiv：1707.
02286，2017.
[80] MNIH V，BADIA A P，MIRZA M，et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning，2016：1928-1937.
[81] FUJIMOTO S，HOOF H，MEGER D.Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning，2018：1587-1596.
[82] OLIEHOEK F A，SPAAN M T J，VLASSIS N.Optimal and approximate Q-value functions for decentralized POMDPs[J].Journal of Artificial Intelligence Research，2008，32：289-353.
[83] SUNEHAG P，LEVER G，GRUSLYS A，et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv：1706.05296，2017.
[84] RASHID T，SAMVELYAN M，SCHROEDER C，et al.Qmix：monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning，2018：4295-4304.
[85] SON K，KIM D，KANG W J，et al.Qtran：learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//International Conference on Machine Learning，2019：5887-5896.
[86] WANG T，WANG J，ZHENG C，et al.Learning nearly decomposable value functions via communication minimization[J].arXiv：1910.05366，2019.
[87] ZHANG T，XU H，WANG X，et al.Multi-agent collaboration via reward attribution decomposition[J].arXiv：2010.
08531，2020.
[88] ABED-ALGUNI B H，PAUL D J，CHALUP S K，et al.A comparison study of cooperative Q-learning algorithms for independent learners[J].Int J Artif Intell，2016，14（1）：71-93.
[89] WANG J，REN Z，LIIU T，et al.Qplex：duplex dueling multi-agent q-learning[J].arXiv：2008.01062，2020.
[90] YANG Y，HAO J，CHEN G，et al.Q-value path decomposition for deep multiagent reinforcement learning[C]//International Conference on Machine Learning，2020：10706-10715.
[91] LOWE R，WU Y，TAMAR A，et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].arXiv：1706.02275，2017.
[92] FOERSTER J，FARQUHAR G，AFOURAS T，et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2018.
[93] DE WITT C S，GUPTA T，MAKOVIICHUK D，et al.Is independent learning all you need in the StarCraft multi-agent challenge[J].arXiv：2011.09533，2020.
[94] YU C，VELU A，VINITSKY E，et al.The surprising effectiveness of mappo in cooperative，multi-agent games[J].arXiv：2103.01955，2021.
[95] IQBAL S，SHA F.Actor-attention-critic for multi-agent reinforcement learning[C]//International Conference on Machine Learning，2019：2961-2970.
[96] LAURENT F，SCHNEIDER M，SCHELLER C，et al.Flatland competition 2020：MAPF and MARL for efficient train coordination on a grid world[J].arXiv：2103.16511，2021.
[97] SARTORETTI G，KERR J，SHI Y，et al.PRIMAL：pathfinding via reinforcement and imitation multi-agent learning[J].IEEE Robotics & Automation Letters，2019，4（3）：2378-2385.
[98] ZHIYAO L，SARTORETTI G.Deep reinforcement learning based multiagent pathfinding[R].2020.
[99] DAMANI M，LUO Z，WENZEL E，et al.PRIMAL2：pathfinding via reinforcement and imitation multi-agent learning-lifelong[J].IEEE Robotics and Automation Letters，2021，6（2）：2666-2673.
[100] LI Q，GAMA F，RIBEIRO A ，et al.Graph neural networks for decentralized multi-robot path planning[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2020：11785-11792.
[101] LIU Z，CHEN B，ZHOU H，et al.Mapper：multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2020：11748-11754.
[102] GUAN H，GAO Y，ZHAO M，et al.AB-Mapper：attention and BicNet based multi-agent path finding for dynamic crowded environment[J].arXiv：2110.00760，2021.
[103] PENG P，WEN Y，YANG Y，et al.Multiagent bidirectionally-coordinated nets：emergence of human-level coordination in learning to play starcraft combat games[J].arXiv：1703.10069，2017.
[104] RIVIERE B，HONIG W，YUE Y，et al.Glas：global-to-local safe autonomy synthesis for multi-robot motion planning with end-to-end learning[J].IEEE Robotics and Automation Letters，2020，5（3）：4249-4256.
[105] LI Q，LIN W，LIU Z，et al.Message-aware graph attention networks for large-scale multi-robot path planning[J].IEEE Robotics and Automation Letters，2021，6（3）：5533-5540.
[106] MA Z，LUO Y，PAN J.Learning selective communication for multi-agent path finding[J].arXiv：2109.05413，2021.
[107] MA Z，LUO Y，Ma H.Distributed heuristic multi-agent path finding with communication[C]//2021 IEEE International Conference on Robotics and Automation（ICRA），2021：8699-8705.
[108] LI W，CHEN H，JIN B，et al.Multi-agent path finding with prioritized communication learning[J].arXiv：2202.
03634，2022.
[109] SKRYNNIK A，YAKOVLEVA A，DAVYDOV V，et al.Hybrid policy learning for multi-agent pathfinding[J].IEEE Access，2021，9：126034-126047.
[110] WANG B，LIU Z，LI Q，et al.Mobile robot path planning in dynamic environments through globally guided reinforcement learning[J].IEEE Robotics and Automation Letters，2020，5（4）：6932-6939.
[111] LIU Z，LIU Q，TANG L，et al.Visuomotor reinforcement learning for multirobot cooperative navigation[J].IEEE Transactions on Automation Science and Engineering，2021：1-12.
[112] LING J，CHANDAK K，KUMAR A.Integrating knowledge compilation with reinforcement learning for routes[C]//Proceedings of the International Conference on Automated Planning and Scheduling，2021：542-550.
[113] LI D，YIN W，WONG W E，et al.Quality-oriented hybrid path planning based on A* and Q-Learning for unmanned aerial vehicle[J].IEEE Access，2021，10：7664-7674.
[114] ZHANG Y，QIAN Y，YAO Y，et al.Learning to cooperate：application of deep reinforcement learning for online AGV path finding[C]//Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems，2020：2077-2079.
[115] LEE H，HONG J，JEONG J.MARL-based dual reward model on segmented actions for multiple mobile robots in automated warehouse environment[J].Applied Sciences，2022，12（9）：4703.
[116] 王毅然，经小川，田涛，等.基于强化学习的多Agent路径规划方法研究[J].计算机应用与软件，2019，36（8）：165-171.
WANG Y R，JING X C，TIAN T，et al.Multi-agent path planning based on reinforcement learning[J].Computer Applications and Software，2019，36（8）：165-171.
[117] 陈思豪，赵成业，王超，等.基于强化学习的多智能体路径规划算法[C]//第32届中国过程控制会议（CPCC2021）论文集，2021：1619.
CHEN S H，ZHAO C Y，WANG C，et al.Multi agent path planning algorithm based on reinforcement learning[C]//32nd Chinese Process Control Conference（CPCC2021），2021：1619.
[118] 郑延斌，李波，安德宇，等.基于分层强化学习及人工势场的多Agent路径规划方法[J].计算机应用，2015，35（12）：3491-3496.
ZHENG Y B，LI B，AN D Y，et al.Multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field[J].Journal of Computer Applications，2015，35（12）：3491-3496.
[119] 张靖南.基于多智能体的群体路径规划研究[D].哈尔滨：哈尔滨工程大学，2019.
ZHANG J N.Research on group path planning based on multi-agent[D].Harbin：Harbin Engineering University，2019.
[120] FREED B，SARTORETTI G，CHOSET H.Simultaneous policy and discrete communication learning for multi-agent cooperation[J].IEEE Robotics and Automation Letters，2020，5（2）：2498-2505.
[121] FREED B，JAMES R，SARTORETTI G，et al.Sparse discrete communication learning for multi-agent cooperation through backpropagation[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2020：7993-7998.
[122] VAN KNIPPENBERG M，HOLENDERSKI M，MENKOVSKI V.Time-constrained multi-agent path finding in non-lattice graphs with deep reinforcement learning[C]//Asian Conference on Machine Learning，2021：1317-1332.
[123] HE Z，WANG J，SONG C.A review of mobile robot motion planning methods：from classical motion planning workflows to reinforcement learning-based architectures[J].arXiv：2108.13619，2021.
[124] PATHAK D，AGRAWAL P，EFROS A A，et al.Curiosity-driven exploration by self-supervised prediction[C]//International Conference on Machine Learning，2017：2778-2787.
[125] ZHELO O，ZHANG J，TAI L，et al.Curiosity-driven exploration for mapless navigation with deep reinforcement learning[J].arXiv：1804.00456，2018.
[126] SHI H，SHI L，XU M，et al.End-to-end navigation strategy with deep reinforcement learning for mobile robots[J].IEEE Transactions on Industrial Informatics，2019，16（4）：2393-2402.
[127] NG A Y，HARADA D，RUSSELL S.Policy invariance under reward transformations：theory and application to reward shaping[C]//Proceedings of ICML，1999，99：278-287.
[128] 赖俊，魏竞毅，陈希亮.分层强化学习综述[J].计算机工程与应用，2021，57（3）：72-79.
LAI J，WEI J Y，CHEN X L.Overview of hierarchical reinforcement learning[J].Computer Engineering and Applications，2021，57（3）：72-79.
[129] HU Z J，GAO X J，WAN K F，et al.Relevant experience learning：a deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments[J].Chinese Journal of Aeronautics，2021，34（12）：187-204.
[130] HE Z，DONG L，SUN C，et al.Reinforcement learning based multi-robot formation control under separation bearing orientation scheme[C]//2020 Chinese Automation Congress（CAC），2020：3792-3797.
[131] LADKIN M，SRINVAS A，ABBEEL P.CURL：contrastive unsupervised representations for reinforcement learning[C]//International Conference on Machine Learning，2020：5639-5650.
[132] SCHWARZER M，ANAND A，GOEL R，et al.Data-efficient reinforcement learning with self-predictive representations[J].arXiv：2007.05929，2020.
[133] JIANG J，LU Z.Learning attentional communication for multi-agent cooperation[C]//Advances in Neural Information Processing Systems，2018.
[134] DING Z，HUANG T，LU Z.Learning individually inferred communication for multi-agent cooperation[C]//Advances in Neural Information Processing Systems，2020：22069-22079.