计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (8): 264-272.DOI: 10.3778/j.issn.1002-8331.2106-0054

• 工程与应用 • 上一篇    下一篇

改进DDPG无人机航迹规划算法

高敬鹏,胡欣瑜,江志烨   

  1. 1.电子信息系统复杂电磁环境效应国家重点实验室,河南 洛阳 471003
    2.哈尔滨工程大学 信息与通信工程学院,哈尔滨 150001
    3.北京航天长征飞行器研究所 试验物理与计算数学国家级重点实验室,北京 100076
  • 出版日期:2022-04-15 发布日期:2022-04-15

Unmanned Aerial Vehicle Track Planning Algorithm Based on Improved DDPG

GAO Jingpeng, HU Xinyu, JIANG Zhiye   

  1. 1.State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System(CEMEE), Luoyang, Henan 471003, China
    2.College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China
    3.National Key Laboratory of Science and Technology on Test Physics and Numerical Mathematics, Beijing Institute of Space Long March Vehicle, Beijing 100076, China
  • Online:2022-04-15 Published:2022-04-15

摘要: 针对无人机飞行过程存在未知威胁使智能算法处理复杂度高,导致航迹实时规划困难,以及深度强化学习中调整DDPG算法参数,存在时间成本过高的问题,提出一种改进DDPG航迹规划算法。围绕无人机航迹规划问题,构建飞行场景模型,根据飞行动力学理论,搭建动作空间,依据非稀疏化思想,设计奖励函数,结合人工蜂群算法,改进DDPG算法模型参数的更新机制,训练网络模型,实现无人机航迹决策控制。仿真结果表明,所提算法整体训练时长仅为原型算法单次平均训练时长的1.98倍,大幅度提升网络训练效率,降低时间成本,且在满足飞行实时性情况下,符合无人机航迹质量需求,为推动深度强化学习在航迹规划的实际应用提供新思路。

关键词: 深度确定性策略梯度算法, 无人机, 航迹规划, 深度强化学习, 人工蜂群算法

Abstract: An improved DDPG flight track planning algorithm is proposed, aiming at the problem of high processing complexity of intelligent algorithm due to unknown threats in UAV flight process which leads to the difficulty of real-time flight track planning, and long training time by adjusting the parameters of DDPG algorithm in deep reinforcement learning. The flight scene model is established under the background of UAV track planning. According to the flight dynamics theory, the action space is built. On the basis of the non-sparse idea, the reward function is designed. Combined with the artificial bee colony algorithm, the updating mechanism of the model parameters of DDPG algorithm is improved, and the network model is trained to achieve the flight track decision-making of UAV. Simulation results show that the overall training time of the proposed algorithm is only 1.98 times of the average training time of the prototype algorithm, the training efficiency is improved, and the cost of time is reduced. Besides, under the condition of satisfy real time flight, the proposed algorithm can meet the demand of UAV track quality, and provides a new idea for promoting the practical application of deep reinforcement learning in flight track planning.

Key words: deep deterministic policy gradient algorithm, unmanned aerial vehicle, track planning, deep reinforcement learning, artificial bee colony algorithm