Intelligent Game Countermeasures Algorithm Based on Opponent Action Prediction

doi:10.3778/j.issn.1002-8331.2111-0362

Abstract

Abstract: In the intelligent game confrontation scenario, the multi-agent reinforcement learning algorithm has the problem of “non stationarity”. The policy of the agent depends not only on the environment, but also on opponent, other agents in the environment. According to the interaction information between the opponent and the environment, predicting its strategy and intention, and adjusting the agent’s own strategy is an effective way to alleviate the above problems. An intelligent game confrontation algorithm based on opponent action prediction is proposed to implicitly model the opponent in the environment. The algorithm obtains the opponent’s policy features through supervised learning, and integrates them with the agent’s reinforcement learning model to alleviate the influence of the opponent on learning stability. The simulation experiments in 1v1 soccer environment show that the proposed algorithm can effectively predict the opponent’s actions, accelerate the learning convergence speed and improve the confrontation level of agents.

Key words: opponent action prediction, dueling double deep Q network（D3QN）, intelligent game confrontation, deep reinforcement learning

摘要： 智能博弈对抗场景中，多智能体强化学习算法存在“非平稳性”问题，智能体的策略不仅取决于环境，还受到环境中对手（其他智能体）的影响。根据对手与环境的交互信息，预测其策略和意图，并以此调整智能体自身策略是缓解上述问题的有效方式。提出一种基于对手动作预测的智能博弈对抗算法，对环境中的对手进行隐式建模。该算法通过监督学习获得对手的策略特征，并将其与智能体的强化学习模型融合，缓解对手对学习稳定性的影响。在1v1足球环境中的仿真实验表明，提出的算法能够有效预测对手的动作，加快学习收敛速度，提升智能体的对抗水平。

关键词: 对手动作预测, 竞争双深度Q网络（D3QN）, 智能博弈对抗, 深度强化学习

HAN Runhai, CHEN Hao, LIU Quan, HUANG Jian. Intelligent Game Countermeasures Algorithm Based on Opponent Action Prediction[J]. Computer Engineering and Applications, 2023, 59(7): 190-197.

韩润海, 陈浩, 刘权, 黄健. 基于对手动作预测的智能博弈对抗算法[J]. 计算机工程与应用, 2023, 59(7): 190-197.

References

[1] SILVER D，SCHRITTWIESER J，SIMONYAN K，et al.Mastering the game of Go without human knowledge[J].Nature，2017，550：354-359.
[2] VINYALS O，BABUSCHKIN I，CZARNECKI W M，et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature，2019，575：350-354.
[3] 董豪，杨静，李少波，等.基于深度强化学习的机器人运动控制研究进展[J].控制与决策，2022（2）：278-292.
DONG H，YANG J，LI S B，et al.Research progress of robot motion control based on deep reinforcement learning[J].Control and Decision，2022（2）：278-292.
[4] SALLAB A，ABDOU M，PEROT E，et al.Deep reinforcement learning framework for autonomous driving[J].Electronic Imaging，2017（19）：70-76.
[5] JUMPER J，EVANS R，PRITZEL A，et al.Highly accurate protein structure prediction with AlphaFold[J].Nature，2021，596：583-589.
[6] 曹雷.基于深度强化学习的智能博弈对抗关键技术[J].指挥信息系统与技术，2019，10（5）：1-7.
CAO L.The key technology of intelligent game confrontation based on deep reinforcement learning[J].Command Information System and Technology，2019，10（5）：1-7.
[7] HERNANDEZ-LEAL P，KARTAL B，TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems，2019，33（6）：750-797.
[8] SUTTON R S，BARTO A G.Reinforcement learning：an introduction[M].[S.l.]：MIT Press，2018.
[9] HERNANDEZ-LEAL P，KAISERS M，BAARSLAG T，et al.A survey of learning in multiagent environments：dealing with non-stationarity[J].arXiv：1707.09183.2017.
[10] ALBRECHT S V，STONE P.Autonomous agents modelling other agents：a comprehensive survey and open problems[J].Artificial Intelligence，2018，258：66-95.
[11] PAPOUDAKIS G，CHRISTIANOS F，RAHMAN A，et al.dealing with non-stationarity in multi-agent deep reinforcement learning[J].arXiv：1906.04737，2019.
[12] 李毅，石纯一.基于BDI的对手Agent模型[J].软件学报，2002，13（4）：643-648.
LI Y，SHI C Y.Anopponent Agent model based on BDI[J].Journal of Software，2002，13（4）：643-648.
[13] 顿文力，孟庆春，庄晓东.对抗性多机器人系统对手建模的研究[J].计算机应用研究，2004（3）：53-55.
DUN W L，MENG Q C，ZHUANG X D.Study on opponent modeling in adversarial multi-robot system[J].Application Research of Computers，2004（3）：53-55.
[14] 李淑琴，龙海楠.基于对手意图预测算法的机器鱼对抗策略研究[J].计算机仿真，2014，31（7）：360-365.
LI S Q，LONG H N.Study of robot fish confrontation strategy based on enemy intention prediction algorithm[J].Computer Simulation，2014，31（7）：360-365.
[15] 薛方正，方帅，徐心和.多机器人对抗系统仿真中的对手建模[J].系统仿真学报，2005（9）：2138-2141.
XUE F Z，FANG S，XU X H.Opponent modeling in adversarial multi-robot system simulation[J].Acta Simulata Systematica Sinica，2005（9）：2138-2141.
[16] 罗键，武鹤.基于交互式动态影响图的对手建模[J].控制与决策，2016，31（4）：635-639.
LUO J，WU H.Opponent modeling based on interactive dynamic influence diagrams[J].Control and Decision，2016，31（4）：635-639.
[17] HE H，BOYD-GRABER J，KWOK K，et al.Opponent modeling in deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning，2016：1804-1813.
[18] HONG Z W，SU S Y，SHANN T Y，et al.A deep policy inference Q-network for multi-agent systems[J].arXiv：1712.07893，2017.
[19] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Human-level control through deep reinforcement learning[J].Nature，2015，518：529-533.
[20] PAPOUDAKIS G，ALBRECHT S V.Variational autoencoders for opponent modeling in multi-agent systems[J].arXiv：2001.10829，2020.
[21] 刘婵娟，赵天昊，刘睿康，等.智能体对手建模研究进展[J].图学学报，2021（5）：703-711.
LIU C J，ZHAO T H，LIU R K，et al.Research progress on opponent modeling for agent[J].Journal of Graphics，2021（5）：703-711.
[22] 罗俊仁，张万鹏，袁唯淋，等.面向多智能体博弈对抗的对手建模框架[J].系统仿真学报，2022（9）：1941-1955.
LUO J R，ZHANG W P，YUAN W L，et al.Research on opponent modeling framework for multi-agent game confrontation[J].Journal of System Simulation，2022（9）：1941-1955.
[23] VAN HASSELT H，GUEZ A，SILVER D.Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2016.
[24] WANG Z，SCHAUL T，HESSEL M，et al.Dueling network architectures for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning，2016：1995-2003.
[25] SCHULMAN J，WOLSKI F，DHARIWAL P，et al.Proximal policy optimization algorithms[J].arXiv：1707.06347，2017.