改进DDPG无人机航迹规划算法

doi:10.3778/j.issn.1002-8331.2106-0054

摘要/Abstract

摘要： 针对无人机飞行过程存在未知威胁使智能算法处理复杂度高，导致航迹实时规划困难，以及深度强化学习中调整DDPG算法参数，存在时间成本过高的问题，提出一种改进DDPG航迹规划算法。围绕无人机航迹规划问题，构建飞行场景模型，根据飞行动力学理论，搭建动作空间，依据非稀疏化思想，设计奖励函数，结合人工蜂群算法，改进DDPG算法模型参数的更新机制，训练网络模型，实现无人机航迹决策控制。仿真结果表明，所提算法整体训练时长仅为原型算法单次平均训练时长的1.98倍，大幅度提升网络训练效率，降低时间成本，且在满足飞行实时性情况下，符合无人机航迹质量需求，为推动深度强化学习在航迹规划的实际应用提供新思路。

关键词: 深度确定性策略梯度算法, 无人机, 航迹规划, 深度强化学习, 人工蜂群算法

Abstract: An improved DDPG flight track planning algorithm is proposed, aiming at the problem of high processing complexity of intelligent algorithm due to unknown threats in UAV flight process which leads to the difficulty of real-time flight track planning, and long training time by adjusting the parameters of DDPG algorithm in deep reinforcement learning. The flight scene model is established under the background of UAV track planning. According to the flight dynamics theory, the action space is built. On the basis of the non-sparse idea, the reward function is designed. Combined with the artificial bee colony algorithm, the updating mechanism of the model parameters of DDPG algorithm is improved, and the network model is trained to achieve the flight track decision-making of UAV. Simulation results show that the overall training time of the proposed algorithm is only 1.98 times of the average training time of the prototype algorithm, the training efficiency is improved, and the cost of time is reduced. Besides, under the condition of satisfy real time flight, the proposed algorithm can meet the demand of UAV track quality, and provides a new idea for promoting the practical application of deep reinforcement learning in flight track planning.

Key words: deep deterministic policy gradient algorithm, unmanned aerial vehicle, track planning, deep reinforcement learning, artificial bee colony algorithm

高敬鹏, 胡欣瑜, 江志烨. 改进DDPG无人机航迹规划算法[J]. 计算机工程与应用, 2022, 58(8): 264-272.

GAO Jingpeng, HU Xinyu, JIANG Zhiye. Unmanned Aerial Vehicle Track Planning Algorithm Based on Improved DDPG[J]. Computer Engineering and Applications, 2022, 58(8): 264-272.

参考文献

[1] 朱杰，鲁艺，张辉明.突发威胁情况下的无人机航迹重规划[J].计算机工程与应用，2018，54（8）：255-259.
ZHU J，LU Y，ZHANG H M.Path replanning for UAV in emergent threats[J].Computer Engineering and Applications，2018，54（8）：255-259.
[2] 任鹏，高晓光.基于NAPPGA算法的无人机低空突防航迹规划[J].计算机仿真，2014，31（4）：102-105.
REN P，GAO X G.Flight path planning for UAV low-altitude penetration based on niche adaptive pseudo parallel genetic algorithm[J].Computer Simulation，2014，31（4）：102-105.
[3] 唐必伟，朱战霞，方群，等.基于改进蚁群算法的无人驾驶飞行器三维航迹规划与重规划[J].西北工业大学学报，2013，31（6）：901-907.
TANG B W，ZHU Z X，FANG Q，et al.Planning and replanning 3D route of UAV using improved ant colony algorithm[J].Journal of Northwestern Polytechnical University，2013，31（6）：901-907.
[4] 贾文涛，李春涛.无人机航迹优化与跟踪技术研究[J].机械制造与自动化，2020，49（6）：156-161.
JIA W T，LI C T.Trajectory optimization of unmanned aerial vehicle and research on its following technology[J].Machine Building & Automation，2020，49（6）：156-161.
[5] 李海，郭水林，周晔.融合动态风险图和改进A*算法的动态改航规划[J].航空科学技术，2021，32（5）：61-71.
LI H，GUO S L，ZHOU Y.Dynamic diversion planning combining dynamic risk map and improved A* algorithm[J].Aeronautical Science & Technology，2021，32（5）：61-71.
[6] 高升，艾剑良，王之豪.混合种群RRT无人机航迹规划方法[J].系统工程与电子技术，2020，42（1）：101-107.
GAO S，AI J L，WANG Z H.Mixed population RRT algorithm for UAV path planning[J].Systems Engineering and Electronics，2020，42（1）：101-107.
[7] ZHANG W，SONG K，RONG X.Coarse-to-fine UAV target tracking with deep reinforcement learning[J].IEEE Transactions on Automation Science and Engineering，2019，16（4）：1522-1530.
[8] VOLODYMYR M，KORAY K，DAVID S，et al.Human-level control through deep reinforcement learning[J].Nature，2015，518（7540）：529-533.
[9] 封硕，舒红，谢步庆.基于改进深度强化学习的三维环境路径规划[J].计算机应用与软件，2021，38（1）：250-255.
FENG S，SHU H，XIE B Q.3D environment path planning based on improved deep reinforcement learning[J].Computer Applications and Software，2021，38（1）：250-255.
[10] LILLICRAP T P，HUNT J J，PRITZEL A.Continuous control with deep reinforcement learning[J].arXiv：1509.02971，
2015.
[11] RODRIGUEZ R，ALEJANDRO S，CARLOS B，et al.A deep reinforcement learning strategy for UAV autonomous landing on a moving platform[J].Journal of Intelligent & Robotic Systems，2019，93（1）：351-366.
[12] 张耀中，许佳林，姚康佳，等.基于DDPG算法的无人机集群追击任务[J].航空学报，2020，41（10）：314-326.
ZHANG Y Z，XU J L，YAO K J，et al.Pursuit missions for UAV swarms based on DDPG algorithm[J].Acta Aeronautica et Astronautica Sinica，2020，41（10）：314-326.
[13] LI B，YANG Z P，CHEN D Q，et al.Maneuvering target tracking of UAV based on MN-DDPG and transfer learning[J].Defence Technology，2021，17（2）：457-466.
[14] 熊礼阳，汤国安，杨昕，等.面向地貌学本源的数字地形分析研究进展与展望[J].地理学报，2021，76（3）：595-611.
XIONG L Y，TANG G A，YANG X，et al.Geomorphology-oriented digital terrain analysis：progress and perspectives[J].Acta Geographica Sinica，2021，76（3）：595-611.
[15] TONG X R，Modeling and realization of real time electronic countermeasure simulation system based on SystemVue[J].Defence Technology，2020，16（2）：470-486.