计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (8): 166-171.DOI: 10.3778/j.issn.1002-8331.1610-0348

• 模式识别与人工智能 • 上一篇    下一篇

基于强化学习的无人坦克对战仿真研究

徐志雄,曹  雷,陈希亮   

  1. 解放军理工大学 指挥信息系统学院,南京 210000
  • 出版日期:2018-04-15 发布日期:2018-05-02

Research on unmanned tank battle simulation based on reinforcement learning

XU Zhixiong, CAO Lei, CHEN Xiliang   

  1. Institute of Command Information System, PLA University of Science and Technology, Nanjing 210000, China
  • Online:2018-04-15 Published:2018-05-02

摘要: 对标准的强化学习进行改进,通过引入动机层,来引入先验知识,加快学习速度。策略迭代选择上,通过采用“同策略”迭代的Sarsa学习算法,代替传统的“异策略”Q学习算法。提出了基于多动机引导的Sarsa学习(MMSarsa)算法,分别和Q学习算法、Sarsa学习算法在坦克对战仿真问题上进行了三种算法的对比实验。实验结果表明,基于多动机引导的Sarsa学习算法收敛速度快且学习效率高。

关键词: 多动机引导, Q学习, Sarsa学习, 无人坦克, 对战仿真

Abstract: To improve the classic reinforcement learning, through the introduction of motivation, prior knowledge is introduced, and the learning speed is speeded up. As to the iteration strategy, it adopts “on-policy” iterative Sarsa learning algorithm instead of traditional “off-policy” Q learning algorithm. It proposes Multi-Motivation Sarsa learning algorithm(MMSarsa) and respectively carries out the comparative tests on tank battle simulation with Q-learning algorithm and Sarsa learning algorithm. The results of experiment show that Sarsa learning algorithm based on motivation guidance has fast convergence rate and high learning efficiency.

Key words: multi-motivation guidance, Q learning, Sarsa learning, unmanned tank, battle simulation