Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (20): 116-123.DOI: 10.3778/j.issn.1002-8331.2401-0032

• Theory, Research and Development • Previous Articles     Next Articles

Research on Bounded Rational Game Algorithm for Ship Target Tracking Based on Reinforcement Learning

CHEN Suxia, XU Qingwen, LIU Jiufu, XIE Hui, LIU Xiangwu   

  1. 1.Department of Computer and Art Design, Henan Light Industry Vocational College, Zhengzhou 450008, China
    2.College of Automation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
  • Online:2024-10-15 Published:2024-10-15

基于强化学习的舰船目标跟踪有限理性博弈算法研究

陈素霞,徐清雯,刘久富,解晖,刘向武   

  1. 1.河南轻工职业学院 计算机与艺术设计系,郑州 450008
    2.南京航空航天大学 自动化学院,南京 211106

Abstract: Since decision-makers in reality are not always able to analyze problems perfectly rationally, a pursuit evasion game algorithm based on bounded rationality is proposed. It establishes a pursuit evasion game model and first solves the saddle point strategies of the two players under perfect rationality. Introducing the bounded rationality level-k model, a structural assumption is made on the level of thinking strategies for pursuers and evaders. It allows both parties to have different strategic reasoning abilities, and gives corresponding levels’ value functions and strategies, which satisfy the HJI equation. As the level increases, the strategy will eventually tend towards Nash equilibrium. Due to the difficulty in directly solving the HJI equation, an actor critic algorithm based on reinforcement learning is used to solve it. The algorithm is designed to enable pursuers to estimate the thinking level of evaders and adopt appropriate strategies. Simplify the motion of a ship as a two-dimensional mathematical model, this paper establishes a ship pursuit and evasion game model, and performs algorithm simulation verification on it.

Key words: pursuit-evasion game, target tracking, reinforcement learning, bounded rationality

摘要: 针对现实中的决策者并非总能完全理性分析问题的情况,提出有限理性下的追逃博弈算法。建立追逃博弈模型,先求解完全理性下博弈双方的鞍点策略。引入有限理性level-k模型,对追击者和躲避者思考策略的层次进行结构性假设,允许追逃双方具备不同的策略推理能力,并给出相应等级的值函数和策略,策略满足HJI方程。随着等级的增加,策略最终会趋于纳什均衡。由于HJI方程难以直接求解,基于强化学习的actor-critic算法进行求解,设计算法使追击者能够估算出躲避者的思维等级并采取合适的策略。以舰船为对象,将舰船运动简化为二维的数学模型,建立舰船追逃博弈模型,对其进行算法仿真验证。

关键词: 追逃博弈, 目标跟踪, 强化学习, 有限理性