Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (1): 308-316.DOI: 10.3778/j.issn.1002-8331.2107-0017

• Engineering and Applications • Previous Articles     Next Articles

Application of RVO-DDPG Algorithm in Multi-UAV Consolidation Route Planning

YANG Xiuxia, GAO Hengjie, LIU Wei, ZHANG Yi   

  1. 1.School of Coast Guard, Naval Aviation University, Yantai, Shandong 264001, China
    2.School of Combat Service, Naval Aviation University, Yantai, Shandong 264001, China
  • Online:2023-01-01 Published:2023-01-01

RVO-DDPG算法在多UAV集结航路规划的应用

杨秀霞,高恒杰,刘伟,张毅   

  1. 1.海军航空大学 岸防兵学院,山东 烟台 264001 
    2.海军航空大学 作战勤务学院,山东 烟台 264001

Abstract: In order to deal with the problem of large calculation and long time in the traditional intelligent optimization algorithm for multi-UAV assembly route planning in uncertain and complex environments, a deep deterministic policy gradient(DDPC) algorithm based on the reciprocal velocity obstacle(RVO) is proposed. For dynamic obstacles in uncertain environments and UAVs in formations, the UAV heading is adjusted by the speed obstacle method to avoid collisions, which improves the convergence speed of the algorithm. A reward function based on comprehensive cost is designed, which transforms the multi-objective optimization problem in multi-UAV route planning into the reward function design problem of DDPG algorithm. Based on the Pycharm software platform, the performance of the algorithm is verified through simulation, and the RVO-DDPG algorithm is compared with a variety of algorithms. Simulation experiments show that the RVO-DDPG algorithm has faster decision-making speed and better practicability.

Key words: unmanned aerial vehicle(UAV), route planning, formation assembly, deep deterministic policy gradient(DDPG), reciprocal velocity obstacle(RVO)

摘要: 针对传统智能优化算法处理不确定复杂环境下多UAV集结航路规划存在计算量大、耗时长的问题,提出了一种基于互惠速度障碍法(reciprocal velocity obstacle,RVO)的深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法。引入互惠速度障碍法指导UAV对不确定环境内障碍进行避碰,有效提高了目标actor网络的收敛速度,增强了算法的学习效率。设计了一种基于综合代价的奖励函数,将多UAV航路规划中的多目标优化问题转化为DDPG算法的奖励函数设计问题,该设计有效解决了传统DDPG算法易产生局部最优解的问题。基于Pycharm软件平台通过仿真验证了该算法的性能,并与多种算法进行对比。仿真实验表明,RVO-DDPG算法具有更快的决策速度和更好的实用性。

关键词: 无人机, 航路规划, 编队集结, 深度确定性策略梯度算法(DDPG), 互惠速度障碍法(RVO)