Research on unmanned tank battle simulation based on reinforcement learning

doi:10.3778/j.issn.1002-8331.1610-0348

Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (8): 166-171.DOI: 10.3778/j.issn.1002-8331.1610-0348

Previous Articles Next Articles

Research on unmanned tank battle simulation based on reinforcement learning

XU Zhixiong, CAO Lei, CHEN Xiliang

Institute of Command Information System, PLA University of Science and Technology, Nanjing 210000, China

Online:2018-04-15 Published:2018-05-02

基于强化学习的无人坦克对战仿真研究

徐志雄，曹雷，陈希亮

解放军理工大学指挥信息系统学院，南京 210000

Abstract

Abstract: To improve the classic reinforcement learning, through the introduction of motivation, prior knowledge is introduced, and the learning speed is speeded up. As to the iteration strategy, it adopts “on-policy” iterative Sarsa learning algorithm instead of traditional “off-policy” Q learning algorithm. It proposes Multi-Motivation Sarsa learning algorithm（MMSarsa） and respectively carries out the comparative tests on tank battle simulation with Q-learning algorithm and Sarsa learning algorithm. The results of experiment show that Sarsa learning algorithm based on motivation guidance has fast convergence rate and high learning efficiency.

Key words: multi-motivation guidance, Q learning, Sarsa learning, unmanned tank, battle simulation

摘要： 对标准的强化学习进行改进，通过引入动机层，来引入先验知识，加快学习速度。策略迭代选择上，通过采用“同策略”迭代的Sarsa学习算法，代替传统的“异策略”Q学习算法。提出了基于多动机引导的Sarsa学习（MMSarsa）算法，分别和Q学习算法、Sarsa学习算法在坦克对战仿真问题上进行了三种算法的对比实验。实验结果表明，基于多动机引导的Sarsa学习算法收敛速度快且学习效率高。

关键词: 多动机引导, Q学习, Sarsa学习, 无人坦克, 对战仿真

XU Zhixiong, CAO Lei, CHEN Xiliang. Research on unmanned tank battle simulation based on reinforcement learning[J]. Computer Engineering and Applications, 2018, 54(8): 166-171.

徐志雄，曹雷，陈希亮. 基于强化学习的无人坦克对战仿真研究[J]. 计算机工程与应用, 2018, 54(8): 166-171.

[1]	SI Yanna, PU Jiexin, ZANG Shaofei. Neural Network Q Learning Algorithm Based on Residual Gradient Method [J]. Computer Engineering and Applications, 2020, 56(18): 137-142.
[2]	LI Fu-fang¹,XIE Dong-qing¹,QI De-yu²,GUO Si-wen¹,HU Jing-lin². Research on Agent-based grid resource management [J]. Computer Engineering and Applications, 2009, 45(10): 30-33.
[3]	HU Xiao-hui. Action choice mechanism of reinforcement learning based on adjusted dynamic parameters [J]. Computer Engineering and Applications, 2008, 44(28): 29-31.
[4]	ZHOU Tong¹,HONG Bing-rong¹,PIAO Song-hao¹,ZHOU Hong-yu². Self-organizing coordination of multi-robot based on Monte Carlo learning [J]. Computer Engineering and Applications, 2007, 43(30): 23-25.

Research on unmanned tank battle simulation based on reinforcement learning

基于强化学习的无人坦克对战仿真研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 4

Recommended Articles

Metrics