Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (9): 19-29.DOI: 10.3778/j.issn.1002-8331.2202-0297
• Research Hotspots and Reviews • Previous Articles Next Articles
WEI Tingting, YUAN Weilin, LUO Junren, ZHANG Wanpeng
Online:
2022-05-01
Published:
2022-05-01
魏婷婷,袁唯淋,罗俊仁,张万鹏
WEI Tingting, YUAN Weilin, LUO Junren, ZHANG Wanpeng. Survey of Opponent Modeling Methods and Applications in Intelligent Game Confrontation[J]. Computer Engineering and Applications, 2022, 58(9): 19-29.
魏婷婷, 袁唯淋, 罗俊仁, 张万鹏. 智能博弈对抗中的对手建模方法及其应用综述[J]. 计算机工程与应用, 2022, 58(9): 19-29.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2202-0297
[1] SILVER D,HUANG A,MADDOSON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [2] BROWN N,SANDHOLM T.Superhuman AI for multiplayer poker[J].Science,2019,365(6456):885-890. [3] VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354. [4] 高巍,罗俊仁,袁唯淋,等.面向对手建模的意图识别方法综述[J].网络与信息安全学报,2021,7(4):86-100. GAO W,LUO J R,YUAN W L,et al.Survey of intention recognition for opponent modeling[J].Chinese Journal of Network and Information Security,2021,7(4):86-100. [5] GRAYSON T.Mosaic warfare[R].DAPPA/STO,2018. [6] DAN J.Air combat evolution[EB/OL].(2019-05-17)[2020-05-01].https://www.darpa.mil/attachments/ACE_ProposersDayProgramBrief.pdf. [7] NASH J.Non-cooperative games[J].Annals of Mathematics,1951:286-295. [8] ALBRECHT S V,STONE P.Autonomous agents modeling other agents:a comprehensive survey and open problems[J].Artificial Intelligence,2018,258:66-95. [9] BILLINGS D,DAVIDSON A,SCHAUENBERG T,et al.Game-tree search with adaptation in stochastic imperfect-information games[C]//International Conference on Computers and Games.Berlin,Heidelberg:Springer,2004:21-34. [10] BROWNE C B,POWLEY E,WHITEHOUSE D,et al.A survey of Monte Carlo tree search methods[J].IEEE Transactions on Computational Intelligence and AI in Games,2012,4(1):1-43. [11] ALBRECHT S V,STONE P.Reasoning about hypothetical agent behaviours and their parameters[C]//Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems,2017:547-555. [12] BOMBINI G,DI M N,FERILLI S,et al.Classifying agent behaviour through relational sequential patterns[C]//KES International Symposium on Agent and Multi-Agent Systems:Technologies and Applications.Berlin,Heidelberg:Springer,2010:273-282. [13] FREEDMAN R G,ZILBERSTEIN S.A unifying perspective of plan,activity,and intent recognition[C]//Proceedings of the Workshop on Plan,Activity,and Intent Recognition,2019:1-8. [14] VIDAL J M,DURFEE E H.Recursive agent modeling using limited rationality[C]//Proceedings of the First International Conference on Multi-Agent Systems,1995:376-383. [15] SONU E,DOSHI P.Scalable solutions of interactive POMDPs using generalized and bounded policy iteration[J].Autonomous Agents and Multi-Agent Systems,2015,29(3):455-494. [16] HERNANDEZ-LEAL P,KAISERS M,BAARSLAG T,et al.A survey of learning in multiagent environments:dealing with non-stationarity[EB/OL].(2019-03-11)[2021-06-01].https://arxiv.org/abs/1707.09183v1. [17] 罗俊仁,张万鹏,袁唯淋,等,面向多智能体博弈对抗的对手建模框架[J/OL].系统仿真学报:1-13[2022-02-16].http://kns.cnki.net/kcms/detail/11.3092.V.20210818.1041. 007.html. LUO J R,ZHANG W P,YUAN W L,et al.Research on opponent modeling framework for multi-agent game confrontation[J].Journal of System Simulation:1-13[2022-02-16].http://kns.cnki.net/kcms/detail/11.3092.V.20210818. 1041.007.html. [18] HE H,BOYD-GRABER J,KWOK K,et al.Opponent modeling in deep reinforcement learning[C]//International Conference on Machine Learning,2016:1804-1813. [19] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[EB/OL].(2013-12-19)[2021-06-03].https://arxiv.org/abs/1312.5602. [20] HONG Z W,SU S Y,SHANN T Y,et al.A deep policy inference q-network for multi-agent systems[EB/OL].(2018-04-09)[2021-06-05].https://arxiv.org/abs/1712.07893. [21] EVERETT R,ROBERTS S.Learning against non-stationary agents with opponent modeling and deep reinforcement learning[C]//2018 AAAI Spring Symposium Series,2018. [22] TIAN Z,WEN Y,GONG Z,et al.A regularized opponent model with maximum entropy objective[EB/OL].(2019-08-19)[2021-06-06].https://arxiv.org/abs/1905.08087. [23] AL-SHEDIVAT M,BANSAL T,BURDA Y,et al.Continuous adaptation via meta-learning in nonstationary and competitive environments[EB/OL].(2018-02-23)[2021-06-10].https://arxiv.org/abs/1710.03641. [24] WU Z,LI K,ZHAO E,et al.L2E:learning to exploit your opponent[EB/OL].(2021-01-18)[2021-06-30].https://arxiv.org/abs/2102.09381. [25] RABINOWITZ N,PERBET F,SONG F,et al.Machine theory of mind[C]//International Conference on Machine Learning,2018:4218-4227. [26] WEN Y,YANG Y,LUO R,et al.Modeling bounded rationality in multi-agent interactions by generalized recursive reasoning[EB/OL].(2020-03-20)[2021-07-20].https://arxiv.org/abs/1901.09216. [27] WEN Y,YANG Y,LUO R,et al.Probabilistic recursive reasoning for multi-agent reinforcement learning[EB/OL].(2019-03-01)[2021-07-21].https://arxiv.org/abs/1901.09207v2. [28] FOERSTER J N,CHEN R Y,AL-SHEDIVAT M,et al.Learning with opponent-learning awareness[EB/OL].(2018-09-19)[2021-06-18].https://arxiv.org/abs/1709.04326. [29] DAVIES I,TIAN Z,WANG J.Learning to model opponent learning(student abstract)[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020:13771-13772. [30] SOUTHEY F,BOWLING M P,LARSON B,et al.Bayes’ bluff:opponent modeling in poker[EB/OL].(2012-07-04)[2021-07-25].https://arxiv.org/abs/1207.1411. [31] GANZFRIED S,SUN Q.Bayesian opponent exploitation in imperfect-information games[C]//2018 IEEE Conference on Computational Intelligence and Games(CIG),2018:1-8. [32] ZHENG Y,MENG Z,HAO J,et al.A deep Bayesian policy reuse approach against non-stationary agents[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems,2018:962-972. [33] YU X,JIANG J,JIANG H,et al.Model-based opponent modeling[EB/OL].(2021-09-04)[2021-07-25].https://arxiv.org/abs/2108.01843. [34] HERNANDEZ-LEAL P,ROSMAN B,TAYLOR M E,et al.A Bayesian approach for learning and tracking switching,non-stationary opponents[C]//Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems,2016:1315-1316. [35] HARTFORD J S.Deep learning for predicting human strategic behavior[D].University of British Columbia,2016. [36] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [37] VANSCHOREN J.Meta-learning:a survey[EB/OL].(2018-10-08)[2021-07-28].https://arxiv.org/abs/1810.03548. [38] FINN C,ABBEEL P,LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning,2017:1126-1135. [39] DE WEERD H,VERBRUGGE R,VERHEIJ B.How much does it help to know what she knows you know? An agent-based simulation study[J].Artificial Intelligence,2013,199:67-92. [40] TIAN R,TOMIZUKA M,SUN L.Learning human rewards by inferring their latent intelligence levels in multi-agent games:a theory-of-mind approach with application to driving data[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2021:4560-4567. [41] CAMERER C F,HO T H,CHONG J K.A cognitive hierarchy theory of one-shot games:some preliminary results[J].Levine’s Bibliography,2003,127(5):7-42. [42] WRIGHT J R,LEYTON-BROWN K.Level-0 models for predicting human behavior in games[J].Journal of Artificial Intelligence Research,2019,64:357-383. [43] DUAN Y,SCHULMAN J,CHEN X,et al.Rl2:fast reinforcement learning via slow reinforcement learning[EB/OL].(2016-11-10)[2021-07-30].https://arxiv.org/abs/1611. 02779v2. [44] LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances in Neural Information Processing Systems,2017,30. [45] RUSU A A,COLMENAREJO S G,GULCEHRE C,et al.Policy distillation[EB/OL].(2016-01-07)[2021-08-10].https://arxiv.org/abs/1511.06295. [46] BROWN N,SANDHOLM T.Superhuman AI for heads-up no-limit poker:libratus beats top professionals[J].Science,2018,359(6374):418-424. [47] 吴松.德州扑克中对手模型的研究[D].哈尔滨:哈尔滨工业大学,2013. WU S.Research of opponent modeling in Texas Hold’em[D].Harbin:Harbin Institute of Technology,2013. [48] 张加佳.非完备信息机器博弈中风险及对手模型的研究[D].哈尔滨:哈尔滨工业大学,2015. ZHANG J J.Research on risk and opponent modeling in imperfect information game[D].Harbin:Harbin Institute of Technology,2015. [49] 毛建博.基于虚拟自我对局的多人非完备信息机器博弈策略研究[D].哈尔滨:哈尔滨工业大学,2018. MAO J B.Research on multi-player imperfect information game strategy based on fictious self-play[D].Harbin:Harbin Institute of Technology,2018. [50] 吴天栋.非完备信息机器博弈算法及对手模型的研究[D].武汉:武汉理工大学,2018. WU T D.Research on incomplete information machine game algorithm and opponent model[D].Wuhan:Wuhan University of Technology,2018. [51] LI X,MIIKKULAINEN R.Opponent modeling and exploitation in poker using evolved recurrent neural networks[C]//Proceedings of the Genetic and Evolutionary Computation Conference,2018:189-196. [52] NASHED S,ZILBERSTEIN S.A survey of opponent modeling in adversarial domains[J].Journal of Artificial Intelligence Research,2022,73:277-327. [53] JOHANAON M B.Robust strategies and counter-strategies:from superhuman to optimal play[D].Alberta:University of Alberta,2016. [54] JOHANAON M B,BOWLING M,ZINKEVICH M.Computing robust counter-strategies[EB/OL].(2007-09-26)[2022-03-18].https://martin.zinkevich.org/publications/rnash.pdf. [55] OHANAON M B,BOWLING M.Data biased robust counter strategies[C]//Proceedings of 12th International Conference on Artificial Intelligence and Statistics,2009:264-271. [56] ABOU RISK N,SZAFRON D.Using counterfactual regret minimization to create competitive multiplayer poker agents[C]//International Conference on Autonomous Agents and Multiagent Systems,2010:159-166. [57] DIETTERICH T G.Ensemble learning[J].The handbook of Brain Theory and Neural Networks,2002,2(1):110-125. [58] EKMEKCI O,SIRIN V.Learning strategies for opponent modeling in poker[C]//Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence,2013. [59] 张宏达,李德才,何玉庆.人工智能与“星际争霸”:多智能体博弈研究新进展[J].无人系统技术,2019,2(1):5-16. ZHANG H D,LI D C,HE Y Q.Artificial intelligence and StarCraft:new progress in multiagent game research[J].Unmanned Systems Technology,2019,2(1):5-16. [60] WEBER B G,MATEAS M.A data mining approach to strategy prediction[C]//2009 IEEE Symposium on Computational Intelligence and Games,2009:140-147. [61] URIARTE A,ONTANON S.Combat models for RTS games[J].IEEE Transactions on Games,2017,10(1):29-41. [62] SYNNAEVE G,BESSIERE P.A Bayesian model for opening prediction in RTS games with application to StarCraft[C]//2011 IEEE Conference on Computational Intelligence and Games(CIG’11),2011:281-288. [63] ONTANON S,BURO M.Adversarial hierarchical-task network planning for complex real-time games[C]//Twenty-Fourth International Joint Conference on Artificial Intelligence,2015. [64] LIN S,ANSHI Z,BO L,et al.HTN guided adversarial planning for RTS games[C]//2020 IEEE International Conference on Mechatronics and Automation(ICMA),2020:1326-1331. [65] BROWN G W.Iterative solution of games by fictitious play[J].Activity Analysis of Production and Allocation,1951,13(1):374-376. [66] HERINRICH J,LANCTOT M,SILVER D.Fictitious self-play in extensive-form games[C]//International Conference on Machine Learning,2015. [67] HEINRICH J,SILVER D.Deep reinforcement learning from self-play in imperfect-information games[EB/OL].(2016-06-28)[2021-08-12].https://arxiv.org/abs/1603.01121. [68] ZHANG L,WANG W,LI S,et al.Monte Carlo neural fictitious self-play:approach to approximate nash equilibrium of imperfect-information games[EB/OL].(2019-04-06)[2021-08-16].https://arxiv.org/abs/1903.09569v2. [69] LANCTOT M,ZAMBALDI V,GRUSIYS A,et al.A unified game-theoretic approach to multiagent reinforcement learning[EB/OL].(2017-11-07)[2021-08-20].https://arxiv.org/abs/1711.00832v1. [70] OMIDSHAFIEI S,PAPADIMITRIOU C,PILIOURAS G,et al.α-rank:multi-agent evaluation by evolution[J].Scientific Reports,2019,9(1):1-29. [71] MULLER P,OMIDSHAFIEI S,ROWLAND M,et al.A generalized training approach for multiagent learning[EB/OL].(2020-02-14)[2021-09-10].https://arxiv.org/abs/1909.12823v2. [72] MCALEER S,LANIER J,FOX R,et al.Pipeline psro:a scalable approach for finding approximate nash equilibria in large games[EB/OL].(2021-02-18)[2021-09-02].https://arxiv.org/abs/2006.08555v2. [73] TIAN Z,REN H,YANG Y,et al.Learning to safely exploit a non-stationary opponent[EB/OL].(2021-05-22)[2021-09-14].https://openreview.net/pdf?id=zoQJBVrhnn3. [74] LIU M,WU C,LIU Q,et al.Safe opponent-exploitation subgame refinement[EB/OL].(2021-09-29)[2022-01-14].https://openreview.net/pdf?id=VwSHZgruNEc. [75] 袁唯淋,廖志勇,高巍,等.计算机扑克智能博弈研究综述[J].网络与信息安全学报,2021,7(5):57-76. YUAN W L,LIAO Z Y,GAO W,et al.Survey on intelligent game of computer poker[J].Chinese Journal of Network and Information Security,2021,7(5):57-76. |
[1] | GAO Jingpeng, HU Xinyu, JIANG Zhiye. Unmanned Aerial Vehicle Track Planning Algorithm Based on Improved DDPG [J]. Computer Engineering and Applications, 2022, 58(8): 264-272. |
[2] | ZHAO Shuxu, YUAN Lin, ZHANG Zhanping. Multi-agent Edge Computing Task Offloading [J]. Computer Engineering and Applications, 2022, 58(6): 177-182. |
[3] | DENG Xin, NA Jun, ZHANG Handuo, WANG Yulin, ZHANG Bin. Personalized Adjustment Method of Intelligent Lamp Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2022, 58(6): 264-270. |
[4] | XU Bo, ZHOU Jianguo, WU Jing, LUO Wei. Routing Optimization Method Based on DDPG and Programmable Data Plane [J]. Computer Engineering and Applications, 2022, 58(3): 143-150. |
[5] | SONG Haonan, ZHAO Gang, SUN Ruoying. Developments of Knowledge Reasoning Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2022, 58(1): 12-25. |
[6] | NIU Pengfei, WANG Xiaofeng, LU Lei, ZHANG Jiulong. Survey on Vehicle Reinforcement Learning in Routing Problem [J]. Computer Engineering and Applications, 2022, 58(1): 41-55. |
[7] | MA Zhihao, ZHU Xiangbin. Research on Quasi-hyperbolic Momentum Gradient for Adversarial Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(24): 90-99. |
[8] | LI Baoshuai, YE Chunming. Job Shop Scheduling Problem Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(23): 248-254. |
[9] | CHENG Yi, HAO Mimi. Path Planning for Indoor Mobile Robot with Improved Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(21): 256-262. |
[10] | KUANG Liqun, LI Siyuan, FENG Li, HAN Xie, XU Qingyu. Application of Deep Reinforcement Learning Algorithm on Intelligent Military Decision System [J]. Computer Engineering and Applications, 2021, 57(20): 271-278. |
[11] | KONG Songtao, LIU Chichi, SHI Yong, XIE Yi, WANG Kun. Review of Application Prospect of Deep Reinforcement Learning in Intelligent Manufacturing [J]. Computer Engineering and Applications, 2021, 57(2): 49-59. |
[12] | ZHANG Rongxia, WU Changxu, SUN Tongchao, ZHAO Zengshun. Progress on Deep Reinforcement Learning in Path Planning [J]. Computer Engineering and Applications, 2021, 57(19): 44-56. |
[13] | YANG Xueyu, CHEN Jianping, FU Qiming, LU You, WU Hongjie. Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method [J]. Computer Engineering and Applications, 2021, 57(19): 104-111. |
[14] | SONG Haonan, ZHAO Gang, WANG Xingfen. Knowledge Reasoning Method Combining Knowledge Representation with Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(19): 189-197. |
[15] | YANG Tong, QIN Jin. Adaptive ε-greedy Strategy Based on Average Episodic Cumulative Reward [J]. Computer Engineering and Applications, 2021, 57(11): 148-155. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||