计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (1): 191-195.DOI: 10.3778/j.issn.1002-8331.1808-0413

• 模式识别与人工智能 • 上一篇    下一篇

基于最小二乘策略迭代的无人机航迹规划方法

陈晓倩,刘瑞祥   

  1. 北京联合大学 智慧城市学院,北京 100101
  • 出版日期:2020-01-01 发布日期:2020-01-02

Route Planning Method Based on Least-Squares Policy Iteration for Unmanned Aerial Vehicle

CHEN Xiaoqian, LIU Ruixiang   

  1. College of Smart City, Beijing Union University, Beijing 100101, China
  • Online:2020-01-01 Published:2020-01-02

摘要: 针对传统强化学习方法因对状态空间进行离散化而无法保证无人机在复杂应用场景中航迹精度的问题,使用最小二乘策略迭代(Least-Squares Policy Iteration,LSPI)算法开展连续状态航迹规划问题研究。该算法采用带参线性函数逼近器近似表示动作值函数,无需进行空间离散化,提高了航迹精度,并基于样本数据离线计算策略,直接对策略进行评价和改进。与Q学习算法的对比仿真实验结果表明LSPI算法规划出的三维航迹更为平滑,有利于飞机实际飞行。

关键词: 无人机, 航迹规划, 强化学习, 最小二乘法, Q学习, 连续状态空间

Abstract: Traditional reinforcement learning methods, in which the state space is discretized, can’t ensure the trajectory accuracy in complex flight applications. The route planning method with continuous state space based on Least-Squares Policy Iteration(LSPI) is presented in this paper. The approximate function is used to represent value function to ensure the trajectory accuracy without space discretization. By offline policy generation based on samples, the policy is evaluated and improved by LSPI directly. Compared with Q-learning, simulation results show that the trajectory planned by LSPI is smoother and more conducive to the actual flight of aircraft.

Key words: unmanned aerial vehicle, route planning, reinforcement learning, least squares method, Q-learning, continuous state spaces