计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (11): 254-259.DOI: 10.3778/j.issn.1002-8331.2003-0163

• 工程与应用 • 上一篇    下一篇

改进TD3算法在四旋翼无人机避障中的应用

唐蕾,刘广钟   

  1. 上海海事大学 信息工程学院,上海 201306
  • 出版日期:2021-06-01 发布日期:2021-05-31

Application for Improved TD3 Algorithm in Obstacle Avoidance of Quad-Rotor UAV

TANG Lei, LIU Guangzhong   

  1. College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
  • Online:2021-06-01 Published:2021-05-31

摘要:

为了提高无人机(Unmanned Aerial Vehicle,UAV)系统的智能避障性能,提出了一种基于双延迟深度确定性策略梯度(Twin Delayed Deep Deterministic Policy Gradient,TD3)的改进算法(Improved Twin Delayed Deep Deterministic Policy Gradient,I-TD3)。该算法通过设置两个经验缓存池分离成功飞行经验和失败飞行经验,并根据两个经验缓存池的不同使用目的分别结合优先经验回放(Prioritized Experience Replay)方法和经验回放(Experience Replay)方法,提高有效经验的采样效率,缓解因无效经验过高导致的训练效率低问题。改进奖励函数,解决因奖励设置不合理导致的训练效果差问题。在AirSim平台上实现仿真实验,结果表明在四旋翼无人机的避障问题上,I-TD3算法的避障效果优于TD3算法和深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法。

关键词: 双延迟深度确定性策略梯度(TD3), 优先经验回放, 避障, 四旋翼无人机

Abstract:

In order to improve the intelligent obstacle avoidance performance of Unmanned Aerial Vehicle(UAV), an improved algorithm called Improved Twin Delayed Deep Deterministic Policy Gradient(I-TD3)based on Twin Delayed Deep Deterministic Policy Gradient(TD3)is proposed. According to the different purposes of experience buffer pools, combined with the Prioritized Experience Replay and the Experience Replay, the success flight experience and failure flight experience are separated by setting two experience buffer pools to enhance the sample efficiency of effective experience, alleviate the problem of low training efficiency prompted by too much invalid experience. Meantime, the reward function is ameliorated to solve the problem of poor training effect caused by unreasonable reward setting. By applying the simulation experiment of quad-rotor UVA on AirSim platform, it is indicated that the obstacle avoidance effect of I-TD3 algorithm is superior to the TD3 algorithm and the Deep Deterministic Policy Gradient(DDPG) algorithm.

Key words: Twin Delayed Deep Deterministic Policy Gradient(TD3), prioritized experience replay, obstacle avoidance, quad-rotor unmanned aerial vehicle