计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (20): 43-64.DOI: 10.3778/j.issn.1002-8331.2203-0467
刘志飞,曹雷,赖俊,陈希亮,陈英
出版日期:
2022-10-15
发布日期:
2022-10-15
LIU Zhifei, CAO Lei, LAI Jun, CHEN Xiliang, CHEN Ying
Online:
2022-10-15
Published:
2022-10-15
摘要: 多智能体路径规划(multi-agent path finding,MAPF)是为多个智能体规划路径的问题,关键约束是多个智能体同时沿着规划路径行进而不会发生冲突。MAPF在物流、军事、安防等领域有着大量应用。对国内外关于MAPF的主要研究成果进行系统整理和分类,按照规划方式不同,MAPF算法分为集中式规划算法和分布式执行算法。集中式规划算法是最经典和最常用的MAPF算法,主要分为基于[A*]搜索、基于冲突搜索、基于代价增长树和基于规约四种算法。分布式执行算法是人工智能领域兴起的基于强化学习的MAPF算法,按照改进技术不同,分布式执行算法分为专家演示型、改进通信型和任务分解型三种算法。基于上述分类,比较MAPF各种算法的特点和适用性,分析现有算法的优点和不足,指出现有算法面临的挑战并对未来工作进行了展望。
刘志飞, 曹雷, 赖俊, 陈希亮, 陈英. 多智能体路径规划综述[J]. 计算机工程与应用, 2022, 58(20): 43-64.
LIU Zhifei, CAO Lei, LAI Jun, CHEN Xiliang, CHEN Ying. Overview of Multi-Agent Path Finding[J]. Computer Engineering and Applications, 2022, 58(20): 43-64.
[1] YAKOVLEV K,ANDEYCHUK A.Any-angle pathfinding for multiple agents based on SIPP algorithm[C]//The Twenty-Seventh International Conference on Automated Planning and Scheduling(ICAPS 2017),2017. [2] LI J,TINKA A,KIESEL S,et al.Lifelong multi-agent path finding in large-scale warehouses[C]//Proceedings of AAMAS,2020:1898-1900. [3] MA H,YANG J,COHEN L,et al.Feasibility study:moving non-homogeneous teams in congested video game environments[C]//Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment,2017:270-272. [4] MOHANTY S,NYGREN E,LAURENT F,et al.Flatland-RL:multi-agent reinforcement learning on trains[J].arXiv:2012. 05893,2020. [5] CHOUDHURY S,SOLOVERY K,KOCHENDERFER M,et al.Coordinated multi-agent pathfinding for drones and trucks over road networks[J].arXiv:2110.08802,2021. [6] CARTUCHO J,VENTURA R,VELOSO M.Robust object recognition through symbiotic deep learning in mobile robots[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2018:2336-2341. [7] FELNER A,STERN R,SHIMONY S E,et al.Search-based optimal solvers for the multi-agent pathfinding problem:summary and challenges[C]//International Symposium on Combinatorial Search,2017. [8] SURYNEK P,FELNER A,STERN R,et al.An empirical comparison of the hardness of multi-agent path finding under the makespan and the sum of costs objectives[C]//Symposium on Combinatorial Search,2016. [9] BARTáK R,?VANCARA J,?KOPKOVá V,et al.Multi-agent path finding on real robots:first experience with ozobots[C]//Ibero-American Conference on Artificial Intelligence.Cham:Springer,2018:290-301. [10] COHEN L,WAGNER G,CHAN D,et al.Rapid randomized restarts for multi-agent path finding solvers[C]//Eleventh Annual Symposium on Combinatorial Search,2018. [11] MA H,HARABOR D,STUCKEY P J,et al.Searching with consistent prioritization for multi-agent path finding[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):7643-7650. [12] STERN R,STURTEVANT N R,FELNER A,et al.Multi-agent pathfinding:definitions,variants,and benchmarks[C]//Twelfth Annual Symposium on Combinatorial Search,2019. [13] STERN R.Multi-agent path finding-an overview[M]//Artificial intelligence.Berlin:Springer-Verlag,2019:96-115. [14] 刘庆周,吴锋.多智能体路径规划研究进展[J].计算机工程,2020,46(4):1-10. LIU Q Z,WU F.Research progress of multi-agent path planning[J].Computer Engineering,2020,46(4):1-10. [15] FERGUSON D,LIKHACHEV M,STENTZ A.A guide to heuristic-based path planning[C]//Proceedings of the International Workshop on Planning Under Uncertainty for Autonomous Systems,International Conference on Automated Planning and Scheduling(ICAPS),2005:9-18. [16] STANDLEY T.Finding optimal solutions to cooperative pathfinding problems[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2010:173-178. [17] WAGNER G,CHOSET H.Subdimensional expansion for multirobot path planning[J].Artificial Intelligence,2015,219(2):1-24. [18] VIRMANI L,REN Z,RATHINAM S,et al.Subdimensional expansion using attention-based learning for multi-agent path finding[J].arXiv:2109.14695,2021. [19] REN Z,RATHINAM S,LIKHACHEV M,et al.Enhanced multi-objective A* using balanced binary search trees[J].arXiv:2202.08992,2022. [20] REN Z,RATHINAM S,CHOSET H.Loosely synchronized search for multi-agent path finding with asynchronous actions[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2021:9714-9719. [21] GOLDENBERG M,FELNER A,STURTEVANT N,et al.Optimal-generation variants of EPEA[C]//International Symposium on Combinatorial Search,2013. [22] GOLDENBERG M,FELNER A,STERN R,et al.Enhanced partial expansion A[J].Journal of Artificial Intelligence Research,2014,50(2):141-187. [23] SHARON G,STERN R,FELNER A,et al.Conflict-based search for optimal multi-agent pathfinding[J].Artificial Intelligence,2015,219(2):40-66. [24] BOYARSKI E,FELNER A,STERN R,et al.ICBS:improved conflict-based search algorithm for multi-agent pathfinding[C]//Twenty-Fourth International Joint Conference on Artificial Intelligence,2015. [25] GANGE G,HARABOR D,STUCKEY P J.Lazy CBS:implicit conflict-based search using lazy clause generation[C]//Proceedings of the International Conference on Automated Planning and Scheduling,2019:155-162. [26] BOYARSKI E,FELNER A,LE BODIC P,et al.f-Aware conflict prioritization & improved heuristics for conflict-based search[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021:12241-12248. [27] LI J,FELNER A,BOYARSKI E,et al.Improved heuristics for multi-agent path finding with conflict-based search[C]//International Joint Conference on Artificial Intelligence(IJCAI),2019:442-449. [28] BARER M,SHARON G,STERN R,et al.Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem[C]//Seventh Annual Symposium on Combinatorial Search,2014. [29] CHAN S H,LI J,GANGE G,et al.ECBS with flex distribution for bounded-suboptimal multi-agent path finding[C]//Proceedings of the International Symposium on Combinatorial Search,2021:159-161. [30] RAHMAN M,ALAM M A,ISLAM M M,et al.An adaptive agent-specific sub-optimal bounding approach for multi-agent path finding[J].IEEE Access,2022,10:22226-22237. [31] LI J,RUML W,KOENING S.EECBS:a bounded-suboptimal search for multi-agent path finding[C]//Proceedings of the AAAI Conference on Artificial Intelligence(AAAI),2021:12353-12362. [32] HUANG T,DILKINA B,KOENING S.Learning to resolve conflicts for multi-agent path finding with conflict-based search[C]//AAAI Conference on Artificial Intelligence,2020. [33] ANDREYCHUK A,YAKOVLEV K,SURYNEK P,et al.Multi-agent pathfinding with continuous time[J].Artificial Intelligence,2022:103662. [34] BOYARSKI E,FELNER A,HARABOR D,et al.Iterative-deepening conflict-based search[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence,2021:4084-4090. [35] SHAARON G,STERN R,GOLDENBERG M,et al.The increasing cost tree search for optimal multi-agent pathfinding[J].Artificial Intelligence,2013,195:470-495. [36] WALKER T T,STURTEVANT N R,FELNER A.Extended increasing cost tree search for non-unit cost domains[C]//International Joint Conference on Artificial Intelligence(IJCAI),2018:534-540. [37] WALKER T T,STURTEVANT N R,FELNER A,et al.Conflict-based increasing cost search[C]//Proceedings of the International Conference on Automated Planning and Scheduling,2021:385-395. [38] YU J,LAVAALLE S M.Structure and intractability of optimal multi-robot path planning on graphs[C]//Twenty-Seventh AAAI Conference on Artificial Intelligence,2013. [39] SURYNEK P.Makespan optimal solving of cooperative path-finding via reductions to propositional satisfiability[J].arXiv:1610.05452,2016. [40] BARTAK R,ZHOU N F,STERN R,et al.Modeling and solving the multi-agent pathfinding problem in picat[C]//2017 IEEE 29th International Conference on Tools with Artificial Intelligence(ICTAI),2017:959-966. [41] ZHOU N F,KJELLERSTRAND H,FRUHMAN J.Constraint solving and planning with picat[M].[S.l.]:Springer International Publishing,2015. [42] ERDEM E,KISA D G,OZTOK U,et al.A general formal framework for pathfinding problems with multiple agents[C]//Twenty-Seventh AAAI Conference on Artificial Intelligence,2013. [43] SURYNEK P.Multi-agent path finding with continuous time viewed through satisfiability modulo theories(SMT)[J].arXiv:1903.09820,2019. [44] YU J,LAVALLE S M.Multi-agent path planning and network flow[M]//Algorithmic foundations of robotics X.Berlin,Heidelberg:Springer,2013:157-173. [45] LI J,CHEN Z,HARABOR D,et al.Anytime multi-agent path finding via large neighborhood search[C]//International Joint Conference on Artificial Intelligence(IJCAI),2021. [46] HUANG T,LI J,KOENING S,et al.Anytime multi-agent path finding via machine learning-guided large neighborhood search[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2022. [47] LI J,CHEN Z,HARABOR D,et al.MAPF-LNS2:fast repairing for multi-agent path finding via large neighborhood search[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2022. [48] OKUMURA K,YONETANI R,NISHIMURA M,et al.CTRMs:learning to construct cooperative timed roadmaps for multi-agent path planning in continuous spaces[J].arXiv:2201.09467,2022. [49] LIU Z,WANG H,WEI H,et al.Prediction,planning,and coordination of thousand-warehousing-robot networks with motion and communication uncertainties[J].IEEE Transactions on Automation Science and Engineering,2020,18(4):1705-1717. [50] ZHANG H,YAO M,LIU Z,et al.A hierarchical approach to multi-agent path finding[C]//Proceedings of the International Symposium on Combinatorial Search,2021:209-211. [51] NEKVINDA M,BARTAK R.Contingent planning for robust multi-agent path finding[C]//2021 IEEE 33rd International Conference on Tools with Artificial Intelligence(ICTAI),2021:487-492. [52] GRESHLER N,GORDON O,SALZMAN O,et al.Cooperative multi-agent path finding:beyond path planning and collision avoidance[C]//2021 International Symposium on Multi-Robot and Multi-Agent Systems(MRS),2021:20-28. [53] FUJITANI Y,YAMAUCHI T,MIYASHITA Y,et al.Deadlock-free method for multi-agent pickup and delivery problem using priority inheritance with temporary priority[J].arXiv:2205.12504,2022. [54] ATZON D,ZAX Y,KIVITY E,et al.Generalizing multi-agent path finding for heterogeneous agents[C]//Thirteenth Annual Symposium on Combinatorial Search,2020. [55] HERNANDEZ-LEAL P,KARTAL B,TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797. [56] OTHMAN W,SHILOV N.Deep reinforcement learning for path planning by cooperative robots:existing approaches and challenges[C]//2021 28th Conference of Open Innovations Association(FRUCT),2021:349-357. [57] SILVER D,HHUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [58] SILVER D,SCHIRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge[J].Nature,2017,550(7676):354-359. [59] MORAVCIK M,SCHMID M,BURCH N,et al.Deepstack:expert-level artificial intelligence in heads-up no-limit poker[J].Science,2017,356(6337):508-513. [60] BROWN N,SANDHOLM T.Superhuman AI for heads-up no-limit poker:libratus beats top professionals[J].Science,2018,359(6374):418-424. [61] BERNER C,BROCKMAN G,CHAN B,et al.Dota 2 with large scale deep reinforcement learning[J].arXiv:1912. 06680,2019. [62] VINYALS O,EWALDS T,BARTUNOV S,et al.Starcraft ii:a new challenge for reinforcement learning[J].arXiv:1708. 04782,2017. [63] FILAR J,VRIEZE K.Competitive Markov decision processes[M].[S.l.]:Springer Science & Business Media,2012. [64] JANG B,KIM M,HARERIMANA G,et al.Q-learning algorithms:a comprehensive classification and applications[J].IEEE Access,2019,7:133653-133667. [65] O’DONOGHUE B,OSBAND I,MUNOS R,et al.The uncertainty bellman equation and exploration[C]//International Conference on Machine Learning,2018:3836-3845. [66] THOMAS P S,BRUNSKILL E.Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines[J].arXiv:1706.06643,2017. [67] WIERING M A,VAN OTTERLO M.Reinforcement learning[J].Adaptation,Learning,and Optimization,2012,12(3):729-734. [68] BHATNAGAR S,SUTTON R S,GHAVAMZADEH M,et al.Natural actor-critic algorithms[J].Automatica,2009,45(11):2471-2482. [69] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312. 5602,2013. [70] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7):529-533. [71] VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2016. [72] WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning,2016:1995-2003. [73] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[J].arXiv:1511.05952,2015. [74] FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[J].arXiv:1706.10295,2017. [75] HESSEL M,MODAYIL J,VAN-HASSELT H,et al.Rainbow:combining improvements in deep reinforcement learning[C]//Thirty-Second AAAI Conference on Artificial Intelligence,2018. [76] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].arXiv:1509.02971,2015. [77] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J].arXiv:1707.06347,2017. [78] SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region policy optimization[C]//International Conference on Machine Learning,2015:1889-1897. [79] HEESS N,TB D,SRIRAM S,et al.Emergence of locomotion behaviours in rich environments[J].arXiv:1707. 02286,2017. [80] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning,2016:1928-1937. [81] FUJIMOTO S,HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning,2018:1587-1596. [82] OLIEHOEK F A,SPAAN M T J,VLASSIS N.Optimal and approximate Q-value functions for decentralized POMDPs[J].Journal of Artificial Intelligence Research,2008,32:289-353. [83] SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017. [84] RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning,2018:4295-4304. [85] SON K,KIM D,KANG W J,et al.Qtran:learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//International Conference on Machine Learning,2019:5887-5896. [86] WANG T,WANG J,ZHENG C,et al.Learning nearly decomposable value functions via communication minimization[J].arXiv:1910.05366,2019. [87] ZHANG T,XU H,WANG X,et al.Multi-agent collaboration via reward attribution decomposition[J].arXiv:2010. 08531,2020. [88] ABED-ALGUNI B H,PAUL D J,CHALUP S K,et al.A comparison study of cooperative Q-learning algorithms for independent learners[J].Int J Artif Intell,2016,14(1):71-93. [89] WANG J,REN Z,LIIU T,et al.Qplex:duplex dueling multi-agent q-learning[J].arXiv:2008.01062,2020. [90] YANG Y,HAO J,CHEN G,et al.Q-value path decomposition for deep multiagent reinforcement learning[C]//International Conference on Machine Learning,2020:10706-10715. [91] LOWE R,WU Y,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].arXiv:1706.02275,2017. [92] FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2018. [93] DE WITT C S,GUPTA T,MAKOVIICHUK D,et al.Is independent learning all you need in the StarCraft multi-agent challenge[J].arXiv:2011.09533,2020. [94] YU C,VELU A,VINITSKY E,et al.The surprising effectiveness of mappo in cooperative,multi-agent games[J].arXiv:2103.01955,2021. [95] IQBAL S,SHA F.Actor-attention-critic for multi-agent reinforcement learning[C]//International Conference on Machine Learning,2019:2961-2970. [96] LAURENT F,SCHNEIDER M,SCHELLER C,et al.Flatland competition 2020:MAPF and MARL for efficient train coordination on a grid world[J].arXiv:2103.16511,2021. [97] SARTORETTI G,KERR J,SHI Y,et al.PRIMAL:pathfinding via reinforcement and imitation multi-agent learning[J].IEEE Robotics & Automation Letters,2019,4(3):2378-2385. [98] ZHIYAO L,SARTORETTI G.Deep reinforcement learning based multiagent pathfinding[R].2020. [99] DAMANI M,LUO Z,WENZEL E,et al.PRIMAL2:pathfinding via reinforcement and imitation multi-agent learning-lifelong[J].IEEE Robotics and Automation Letters,2021,6(2):2666-2673. [100] LI Q,GAMA F,RIBEIRO A ,et al.Graph neural networks for decentralized multi-robot path planning[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2020:11785-11792. [101] LIU Z,CHEN B,ZHOU H,et al.Mapper:multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2020:11748-11754. [102] GUAN H,GAO Y,ZHAO M,et al.AB-Mapper:attention and BicNet based multi-agent path finding for dynamic crowded environment[J].arXiv:2110.00760,2021. [103] PENG P,WEN Y,YANG Y,et al.Multiagent bidirectionally-coordinated nets:emergence of human-level coordination in learning to play starcraft combat games[J].arXiv:1703.10069,2017. [104] RIVIERE B,HONIG W,YUE Y,et al.Glas:global-to-local safe autonomy synthesis for multi-robot motion planning with end-to-end learning[J].IEEE Robotics and Automation Letters,2020,5(3):4249-4256. [105] LI Q,LIN W,LIU Z,et al.Message-aware graph attention networks for large-scale multi-robot path planning[J].IEEE Robotics and Automation Letters,2021,6(3):5533-5540. [106] MA Z,LUO Y,PAN J.Learning selective communication for multi-agent path finding[J].arXiv:2109.05413,2021. [107] MA Z,LUO Y,Ma H.Distributed heuristic multi-agent path finding with communication[C]//2021 IEEE International Conference on Robotics and Automation(ICRA),2021:8699-8705. [108] LI W,CHEN H,JIN B,et al.Multi-agent path finding with prioritized communication learning[J].arXiv:2202. 03634,2022. [109] SKRYNNIK A,YAKOVLEVA A,DAVYDOV V,et al.Hybrid policy learning for multi-agent pathfinding[J].IEEE Access,2021,9:126034-126047. [110] WANG B,LIU Z,LI Q,et al.Mobile robot path planning in dynamic environments through globally guided reinforcement learning[J].IEEE Robotics and Automation Letters,2020,5(4):6932-6939. [111] LIU Z,LIU Q,TANG L,et al.Visuomotor reinforcement learning for multirobot cooperative navigation[J].IEEE Transactions on Automation Science and Engineering,2021:1-12. [112] LING J,CHANDAK K,KUMAR A.Integrating knowledge compilation with reinforcement learning for routes[C]//Proceedings of the International Conference on Automated Planning and Scheduling,2021:542-550. [113] LI D,YIN W,WONG W E,et al.Quality-oriented hybrid path planning based on A* and Q-Learning for unmanned aerial vehicle[J].IEEE Access,2021,10:7664-7674. [114] ZHANG Y,QIAN Y,YAO Y,et al.Learning to cooperate:application of deep reinforcement learning for online AGV path finding[C]//Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems,2020:2077-2079. [115] LEE H,HONG J,JEONG J.MARL-based dual reward model on segmented actions for multiple mobile robots in automated warehouse environment[J].Applied Sciences,2022,12(9):4703. [116] 王毅然,经小川,田涛,等.基于强化学习的多Agent路径规划方法研究[J].计算机应用与软件,2019,36(8):165-171. WANG Y R,JING X C,TIAN T,et al.Multi-agent path planning based on reinforcement learning[J].Computer Applications and Software,2019,36(8):165-171. [117] 陈思豪,赵成业,王超,等.基于强化学习的多智能体路径规划算法[C]//第32届中国过程控制会议(CPCC2021)论文集,2021:1619. CHEN S H,ZHAO C Y,WANG C,et al.Multi agent path planning algorithm based on reinforcement learning[C]//32nd Chinese Process Control Conference(CPCC2021),2021:1619. [118] 郑延斌,李波,安德宇,等.基于分层强化学习及人工势场的多Agent路径规划方法[J].计算机应用,2015,35(12):3491-3496. ZHENG Y B,LI B,AN D Y,et al.Multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field[J].Journal of Computer Applications,2015,35(12):3491-3496. [119] 张靖南.基于多智能体的群体路径规划研究[D].哈尔滨:哈尔滨工程大学,2019. ZHANG J N.Research on group path planning based on multi-agent[D].Harbin:Harbin Engineering University,2019. [120] FREED B,SARTORETTI G,CHOSET H.Simultaneous policy and discrete communication learning for multi-agent cooperation[J].IEEE Robotics and Automation Letters,2020,5(2):2498-2505. [121] FREED B,JAMES R,SARTORETTI G,et al.Sparse discrete communication learning for multi-agent cooperation through backpropagation[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2020:7993-7998. [122] VAN KNIPPENBERG M,HOLENDERSKI M,MENKOVSKI V.Time-constrained multi-agent path finding in non-lattice graphs with deep reinforcement learning[C]//Asian Conference on Machine Learning,2021:1317-1332. [123] HE Z,WANG J,SONG C.A review of mobile robot motion planning methods:from classical motion planning workflows to reinforcement learning-based architectures[J].arXiv:2108.13619,2021. [124] PATHAK D,AGRAWAL P,EFROS A A,et al.Curiosity-driven exploration by self-supervised prediction[C]//International Conference on Machine Learning,2017:2778-2787. [125] ZHELO O,ZHANG J,TAI L,et al.Curiosity-driven exploration for mapless navigation with deep reinforcement learning[J].arXiv:1804.00456,2018. [126] SHI H,SHI L,XU M,et al.End-to-end navigation strategy with deep reinforcement learning for mobile robots[J].IEEE Transactions on Industrial Informatics,2019,16(4):2393-2402. [127] NG A Y,HARADA D,RUSSELL S.Policy invariance under reward transformations:theory and application to reward shaping[C]//Proceedings of ICML,1999,99:278-287. [128] 赖俊,魏竞毅,陈希亮.分层强化学习综述[J].计算机工程与应用,2021,57(3):72-79. LAI J,WEI J Y,CHEN X L.Overview of hierarchical reinforcement learning[J].Computer Engineering and Applications,2021,57(3):72-79. [129] HU Z J,GAO X J,WAN K F,et al.Relevant experience learning:a deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments[J].Chinese Journal of Aeronautics,2021,34(12):187-204. [130] HE Z,DONG L,SUN C,et al.Reinforcement learning based multi-robot formation control under separation bearing orientation scheme[C]//2020 Chinese Automation Congress(CAC),2020:3792-3797. [131] LADKIN M,SRINVAS A,ABBEEL P.CURL:contrastive unsupervised representations for reinforcement learning[C]//International Conference on Machine Learning,2020:5639-5650. [132] SCHWARZER M,ANAND A,GOEL R,et al.Data-efficient reinforcement learning with self-predictive representations[J].arXiv:2007.05929,2020. [133] JIANG J,LU Z.Learning attentional communication for multi-agent cooperation[C]//Advances in Neural Information Processing Systems,2018. [134] DING Z,HUANG T,LU Z.Learning individually inferred communication for multi-agent cooperation[C]//Advances in Neural Information Processing Systems,2020:22069-22079. |
[1] | 伍洲, 张洪瑞, 张海军, 宋晴. 近邻场优化算法研究与应用综述[J]. 计算机工程与应用, 2022, 58(9): 1-8. |
[2] | 魏婷婷, 袁唯淋, 罗俊仁, 张万鹏. 智能博弈对抗中的对手建模方法及其应用综述[J]. 计算机工程与应用, 2022, 58(9): 19-29. |
[3] | 高敬鹏, 胡欣瑜, 江志烨. 改进DDPG无人机航迹规划算法[J]. 计算机工程与应用, 2022, 58(8): 264-272. |
[4] | 司彦娜, 普杰信, 孙力帆. 近似强化学习算法研究综述[J]. 计算机工程与应用, 2022, 58(8): 33-44. |
[5] | 陈博文, 邹海. 总结性自适应变异的粒子群算法[J]. 计算机工程与应用, 2022, 58(8): 67-75. |
[6] | 蔡启明, 张磊, 许宸豪. 基于单层神经网络的流程相似性的研究[J]. 计算机工程与应用, 2022, 58(7): 295-302. |
[7] | 许杰, 祝玉坤, 邢春晓. 基于深度强化学习的金融交易算法研究[J]. 计算机工程与应用, 2022, 58(7): 276-285. |
[8] | 赵庶旭, 元琳, 张占平. 多智能体边缘计算任务卸载[J]. 计算机工程与应用, 2022, 58(6): 177-182. |
[9] | 刘奥博, 袁杰. 目标偏置双向RRT*算法的机器人路径规划[J]. 计算机工程与应用, 2022, 58(6): 234-240. |
[10] | 邓心, 那俊, 张瀚铎, 王昱林, 张斌. 基于深度强化学习的智能灯个性化调节方法[J]. 计算机工程与应用, 2022, 58(6): 264-270. |
[11] | 谌钟毓, 韩燮, 谢剑斌, 熊风光, 况立群. 双损失估计下强化学习型图像匹配方法[J]. 计算机工程与应用, 2022, 58(5): 240-246. |
[12] | 陈智丽, 高皓, 潘以轩, 邢风. 乳腺X线图像计算机辅助诊断技术综述[J]. 计算机工程与应用, 2022, 58(4): 1-21. |
[13] | 欧阳城添, 周凯. 融合改进天牛须搜索的教与学优化算法[J]. 计算机工程与应用, 2022, 58(4): 91-99. |
[14] | 鞠思博, 徐晶, 李岩芳. 基于自注意力机制的文本生成单目标图像方法[J]. 计算机工程与应用, 2022, 58(3): 249-258. |
[15] | 唐怀洞, 卢福强, 王雷震, 王素欣, 毕华玲. 软时间窗多式联运4PL路径问题的改进乌鸦算法[J]. 计算机工程与应用, 2022, 58(3): 274-281. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||