Chinese Chess Game Using Statistical Data Parallel Monte Carlo Tree Search Algorithm

doi:10.3778/j.issn.1002-8331.2401-0241

Abstract

Abstract: Aiming at the problems of slow convergence of Monte Carlo tree search (MCTS) and information loss of key nodes in the game process, a strategic value network suitable for Chinese chess game system is constructed with Chinese chess as the carrier, and a parallel Monte Carlo tree search based on statistics (SPMCTS) algorithm is proposed. The focus of parallelization is set to the most time-consuming expansion and simulation steps among the four steps of MCTS, which effectively avoids the waiting time lag during the algorithm execution. A new set of statistical data is introduced, which is used to modify the node selection strategy in the selection step of MCTS, ensuring that more available information is obtained and utilized during node selection, and alleviating the impact of information loss on accuracy. The experimental results show that compared with the existing parallel Monte Carlo tree algorithm, SPMCTS can accelerate the search speed by about 34%, and the game win rate can be maintained at about 80% in the chess experiment, which indicates the effectiveness of SPMCTS.

Key words: Monte Carlo tree search, Chinese chess, game system, strategy value network, parallelization, statistics

摘要： 针对蒙特卡洛树搜索算法（Monte Carlo tree search，MCTS）收敛速度过慢，且在博弈过程中关键节点会出现信息丢失等问题，以中国象棋为载体，构建适用于中国象棋博弈系统的策略价值网络，提出了一种基于统计数据的并行蒙特卡洛树搜索算法（parallel Monte Carlo tree search based on statistics，SPMCTS）。将并行化的重点设置在MCTS四个步骤中最耗时的扩展和模拟步骤，有效避免了算法执行过程中的等待时差。并且引入一组新统计数据，这些数据用于在MCTS的选择步骤中修改节点的选择策略，保证在进行节点选择时获取和利用更多的可用信息，缓解信息丢失对精度造成的影响。实验结果表明，与现有并行蒙特卡洛树算法相比，SPMCTS在搜索速度上加快了约34%，且在对弈实验中，博弈胜率也能保持在80%左右。验证了SPMCTS的有效性。

关键词: 蒙特卡洛树搜索, 中国象棋, 博弈系统, 策略价值网络, 并行化, 统计数据

ZHU Zhou, MIN Huasong. Chinese Chess Game Using Statistical Data Parallel Monte Carlo Tree Search Algorithm[J]. Computer Engineering and Applications, 2024, 60(23): 340-348.

朱舟, 闵华松. 利用统计数据并行蒙特卡罗树搜索算法的中国象棋博弈[J]. 计算机工程与应用, 2024, 60(23): 340-348.

References

[1] GRANTER S R, BECK A H, PAPKE D J. AlphaGo, deep learning, and the future of the human microscopist[J]. Archives of Pathology & Laboratory Medicine, 2017, 141(5): 619-621.
[2] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[3] HUANG C M, GUO J H, SU K L. Based on short motion paths and artificial intelligence method for Chinese chess game[J]. Journal of Robotics Networking and Artificial Life, 2017, 4(2): 154-157.
[4] WANG F, HOU X, DUAN Z, et al. The perceptual differences between experienced Chinese chess players and novices: evidence from eye movement[J]. Acta Psychologica Sinica, 2016, 48(5): 457.
[5] TAO J, WU G, PAN X. Design and improvement of the pruning algorithm of the Chinese chess in the computer games[J]. The Journal of Engineering, 2020(13): 426-428.
[6] CHEN J C, TSENG W J, WU I C, et al. Comparison training for computer Chinese chess[J]. IEEE Transactions on Games, 2020, 12(2): 169-176.
[7] HE W, ZHAO W, JIANG Y. Application of Q-learningand RBF network in Chinese chess game system[J]. IOP Conference Series: Materials Science and Engineering, 2019, 677(2): 022101.
[8] SILVER D, SCHRITTWIESER J, SIMONYANK, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354-359.
[9] GOLDWASER A, THIELSCHER M. Deep reinforcement learning for general game playing[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 1701-1708.
[10] SOEJIMA Y, KISHIMOTO A, WATANABEO. Evaluating root parallelization in Go[J]. IEEE Transactions on Computational Intelligence & AI in Games, 2011, 2(4): 278-287.
[11] LIU S, CAO J, WANG Y, et al. Self-play reinforcement learning with comprehensive critic in computer games[J]. Neurocomputing, 2021, 449(18): 207-213.
[12] FAN S, ZHANG S, LIU J, et al. Power converter circuit design automation using parallel Monte Carlo tree search[J]. ACM Transactions on Design Automation of Electronic Systems, 2023, 28(2): 17-33.
[13] MILEWICZ R M, POULDING S. Scalable parallel model checking via Monte-Carlo tree search[J]. ACM SIGSOFT Software Engineering Notes, 2018, 42(4): 1-5.
[14] BROWNE C B, POWLEY E, WHITEHOUSE D, et al. A survey of Monte Carlo tree search methods[J]. IEEE Transactions on Computational Intelligence & AI in Games, 2012, 4(1): 1-43.
[15] ZHANG J, SUN X, ZHANG D, et al. Fittest survival: an enhancement mechanism for Monte Carlo tree search[J]. International Journal of Bio-Inspired Computation, 2021, 18(2): 122-130.
[16] BORY P. Deep new: the shifting narratives of artificial intelligence from deep blue to AlphaGo[J]. Convergence: The International Journal of Research into New Media Technologies, 2019, 25(4): 627-642.
[17] ALBA C, VICENTE G, EDWARD R N, et al. Improving Monte Carlo tree search with artificial neural networks without heuristics[J]. Applied Sciences, 2021, 11(5): 2056.
[18] MIRSOLEIMANI S A, HERIK J V D, PLAAT A, et al. Pipeline pattern for parallel MCTS[C]//Proceedings of the 10th International Conference on Agents and Artificial Intelligence, 2018: 614-621.
[19] STEINMETZ E S, GINI M. More trees or larger trees: parallelizing Monte Carlo tree search[J]. IEEE Transactions on Games, 2021, 13(3): 315-320.
[20] MIRSOLEIMANI S A, HERIK J V D, PLAAT A, et al. A lock-free algorithm for parallel MCTS[C]//Proceedings of the 10th International Conference on Agents and Artificial Intelligence, 2018: 589-598.
[21] VICTOR G, ROLANDO E N, VICENTE G, et al. Monte Carlo tree search as a tool for self-learning and teaching people to play complete information boardgames[J]. Electronics, 2021, 10(21): 2609.
[22] SILVER D, HUBERT T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Science, 2018, 362(6419): 1140-1144.
[23] MIRSOLEIMANI S A, HERIK J V D, PLAAT A, et al. An analysis of virtual loss in parallel MCTS[C]//Proceedings of the 9th International Conference on Agents and Artificial Intelligence, 2017: 648-652.
[24] ZHU X, LUO Y, LIU A, et al. A deep reinforcement learning-based resource management game in vehicular edge computing[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(3): 2422-2433.
[25] WANG L, ZHAO Y, JINNAI Y, et al. Neural architecture search using deep neural networks and Monte Carlo tree search[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 9983-9991.
[26] LIU P S, ZHOU J Z, LV J C. Exploring the first-movebalance point of Go-Moku based on reinforcement learning and Monte Carlo tree search[J]. Knowledge-Based Systems, 2023, 261(15): 110207.
[27] DONG P, LIU H C, LEI X. Monte Carlo tree search based non-coplanar trajectory design for station parameter optimized radiation therapy (SPORT)[J]. Physics in Medicine and Biology, 2018, 63(13): 135014.
[28] WANG Y F, WEI Y L, HUANG X L, et al. Robot navigation with predictive capabilities using graph learning and Monte Carlo tree search[J]. Proceedings of the Institution of Mechanical Engineers, 2023, 237(5): 805-814.