Application of Deep Reinforcement Learning Algorithm on Intelligent Military Decision System
KUANG Liqun, LI Siyuan, FENG Li, HAN Xie, XU Qingyu
1.School of Data Science and Technology, North University of China, Taiyuan 030051, China
2.Department of Simulation Equipment, North Automatic Control Technology Institute, Taiyuan 030006, China
KUANG Liqun, LI Siyuan, FENG Li, HAN Xie, XU Qingyu. Application of Deep Reinforcement Learning Algorithm on Intelligent Military Decision System[J]. Computer Engineering and Applications, 2021, 57(20): 271-278.
[1] 殷昌盛,杨若鹏,邹小飞,等.指挥智能化研究综述[C]//第八届中国指挥控制大会论文集,2020.
YIN C S,YANG R P,ZHOU X F,et al.A survey on military intelligent command[C]//Proceedings of the 8th China Command and Control Conference,2020.
[2] SUTTON R S,BARTO A G.Introduction to reinforcement learning[M].Cambridge:MIT Press,1998.
[3] LUONG N C,HOANG D T,GONG S,et al.Applications of deep reinforcement learning in communications and networking:a survey[J].IEEE Communications Surveys & Tutorials,2019,21(4):3133-3174.
[4] WATKINS C J C H,DAYAN P.Q-learning[J].Machine Learning,1992,8(3/4):279-292.
[5] JIANG H,GUI R,CHEN Z,et al.An improved sarsa(lambda) reinforcement learning algorithm for wireless communication systems[J].IEEE Access,2019,7:115418-115427.
[6] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[7] HASSELT H V.Double Q-learning[C]//Advances in Neural Information Processing Systems,2010:2613-2621.
[8] PANOV A I,YAKOVLEV K S,SUVOROV R.Grid path planning with deep reinforcement learning:preliminary results[J].Procedia Computer Science,2018,123:347-353.
[9] 杨克巍.半自治作战agent模型及其应用研究[D].长沙:国防科学技术大学,2004.
YANG K W.Research and application of semi-autonomous combat agent model[D].Changsha:National University of Defense Technology,2004.
[10] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.
5602,2013.
[11] 陈希亮,张永亮.基于深度强化学习的陆军分队战术决策问题研究[J].军事运筹与系统工程,2017,31(3):20-27.
CHENG X L,ZHANG Y L.Research on tactical decision of army unit based on deep reinforcement learning[J].Military Operations Research and Systems Engineering,2017,31(3):20-27.
[12] MA X,XIA L,ZHAO Q.Air-combat strategy using deep Q-learning[C]//2018 Chinese Automation Congress(CAC),2018:3952-3957.
[13] 姚桐,王越,董岩,等.深度强化学习在作战任务规划中的应用[J].飞航导弹,2020(4):16-21.
YAO T,WANG Y,DONG Y,et al.Application of deep reinforcement learning in combat mission planning[J].Aerodynamic Missile Journal,2020(4):16-21.
[14] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].arXiv:1509.02971,2015.
[15] LI Yue,QIU Xiaohui,LIU Xiaodong,et al.Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs[J].Journal of Systems Engineering and Electronics,2020,31(4):734-742.
[16] 郑健,陈建,朱琨.基于多智能体强化学习的无人集群协同设计[J].指挥信息系统与技术,2020,11(6):26-31.
ZHENG J,CHEN J,ZHU K.Unmanned swarm cooperative design based on multi-agent reinforcement learning[J].Command Informatipn System and Technology,2020,11(6):26-31.
[17] 陈亮,梁宸,张景异,等.Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法[J].控制与决策,2021,36(1):75-82.
CHEN L,LIANG C,ZHANG J Y,et al.A multi-agent reinforcement learning algorithm based on improved DDPG in Actor-Critic framework[J].Control and Decision,2021,36(1):75-82.
[18] 赵毓,郭继峰,郑红星,等.基于强化学习的多无人机避碰计算制导方法[J].导航定位与授时,2021,8(1):31-40.
ZHAO Y,GUO J,ZHENG H X,et al.Reinforcement learning-based collision avoidance guidance algorithm for fixed-wing UAVs[J].Navigation Positioning and Timing,2021,8(1):31-40.
[19] LI B,WU Y.Path planning for UAV ground target tracking via deep reinforcement learning[J].IEEE Access,2020,8:29064-29074.
[20] HUANG W,WANG Y,YI X.A deep reinforcement learning approach to preserve connectivity for multi-robot systems[C]//2017 10th International Congress on Image and Signal Processing,BioMedical Engineering and Informatics(CISP-BMEI),2017:1-7.
[21] 吴昭欣,李辉,王壮,等.基于深度强化学习的智能仿真平台设计[J].战术导弹技术,2020(4):193-200.
WU S X,LI H,WANG Z,et al.The design of intelligence simulation platform based on DRL[J].Tactical Missile Technology,2020(4):193-200.
[22] HOU Y,LIU L,WEI Q,et al.A novel DDPG method with prioritized experience replay[C]//2017 IEEE International Conference on Systems,Man,and Cybernetics(SMC),2017:316-321.
[23] 吴球业.基于Actor-Critic结构的受扰倒立摆平衡控制究[J].信息系统工程,2020(3):146-147.
WU Q Y.Research on balance control of disturbed inverted pendulum based on actor critical structure[J].China CIO News,2020(3):146-147.
[24] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[J].arXiv:1511.05952,2015.