Route Planning Method Based on Least-Squares Policy Iteration for Unmanned Aerial Vehicle

doi:10.3778/j.issn.1002-8331.1808-0413

Abstract

Abstract: Traditional reinforcement learning methods, in which the state space is discretized, can’t ensure the trajectory accuracy in complex flight applications. The route planning method with continuous state space based on Least-Squares Policy Iteration（LSPI） is presented in this paper. The approximate function is used to represent value function to ensure the trajectory accuracy without space discretization. By offline policy generation based on samples, the policy is evaluated and improved by LSPI directly. Compared with Q-learning, simulation results show that the trajectory planned by LSPI is smoother and more conducive to the actual flight of aircraft.

Key words: unmanned aerial vehicle, route planning, reinforcement learning, least squares method, Q-learning, continuous state spaces

摘要： 针对传统强化学习方法因对状态空间进行离散化而无法保证无人机在复杂应用场景中航迹精度的问题，使用最小二乘策略迭代（Least-Squares Policy Iteration，LSPI）算法开展连续状态航迹规划问题研究。该算法采用带参线性函数逼近器近似表示动作值函数，无需进行空间离散化，提高了航迹精度，并基于样本数据离线计算策略，直接对策略进行评价和改进。与Q学习算法的对比仿真实验结果表明LSPI算法规划出的三维航迹更为平滑，有利于飞机实际飞行。

关键词: 无人机, 航迹规划, 强化学习, 最小二乘法, Q学习, 连续状态空间

CHEN Xiaoqian, LIU Ruixiang. Route Planning Method Based on Least-Squares Policy Iteration for Unmanned Aerial Vehicle[J]. Computer Engineering and Applications, 2020, 56(1): 191-195.

陈晓倩，刘瑞祥. 基于最小二乘策略迭代的无人机航迹规划方法[J]. 计算机工程与应用, 2020, 56(1): 191-195.

[1]	HOU Xuan, XUE Fei, CHEN Tao. UAV Target Detection on Quantum Multi-pattern Recognition Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(7): 228-236.
[2]	WANG Xiao, TANG Lun, HE Xiaoyu, CHEN Qianbin. Multi-dimensional Resource Optimization of Service Function Chain Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(4): 68-76.
[3]	ZHANG Junjie, ZHANG Cong, ZHAO Hanjie. Dueling Deep Q Network Algorithm with State Value Reuse [J]. Computer Engineering and Applications, 2021, 57(4): 134-140.
[4]	YU Xiaojie, HE Yong, LIU Shenghua. Improved ORB Feature Optical Flow Algorithm for Indoor Positioning of Unmanned Aerial Vehicle [J]. Computer Engineering and Applications, 2021, 57(4): 266-271.
[5]	LAI Jun, WEI Jingyi, CHEN Xiliang. Overview of Hierarchical Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(3): 72-79.
[6]	MA Zhihao, ZHU Xiangbin. Research on Quasi-hyperbolic Momentum Gradient for Adversarial Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(24): 90-99.
[7]	LIN Shubin, WU Guishan, XU Jiayun, YANG Wenyuan. Multi-frame Surveillance of Correlation Filter in UAV Object Tracking [J]. Computer Engineering and Applications, 2021, 57(24): 152-160.
[8]	LI Baoshuai, YE Chunming. Job Shop Scheduling Problem Based on Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(23): 248-254.
[9]	GU Haiyan, CHEN Liang, WANG Duodian. Space-Time Cooperative Path Planning for Multi-UAV Using Model Predictive Control [J]. Computer Engineering and Applications, 2021, 57(23): 270-279.
[10]	WANG Jun, CAO Lei, CHEN Xiliang, LAI Jun, ZHANG Legui. Overview on Reinforcement Learning of Multi-agent Game [J]. Computer Engineering and Applications, 2021, 57(21): 1-13.
[11]	CHENG Yi, HAO Mimi. Path Planning for Indoor Mobile Robot with Improved Deep Reinforcement Learning [J]. Computer Engineering and Applications, 2021, 57(21): 256-262.
[12]	KUANG Liqun, LI Siyuan, FENG Li, HAN Xie, XU Qingyu. Application of Deep Reinforcement Learning Algorithm on Intelligent Military Decision System [J]. Computer Engineering and Applications, 2021, 57(20): 271-278.
[13]	MENG Xiangfu, WANG Dandan, ZHANG Feng. Overview of Spatial Keyword Queries [J]. Computer Engineering and Applications, 2021, 57(20): 13-24.
[14]	KONG Songtao, LIU Chichi, SHI Yong, XIE Yi, WANG Kun. Review of Application Prospect of Deep Reinforcement Learning in Intelligent Manufacturing [J]. Computer Engineering and Applications, 2021, 57(2): 49-59.
[15]	LI Hao, NING Haoyu, KANG Yan, LIANG Wentao, HUO Wen. SMRFGAN Model for Text Emotion Transfer [J]. Computer Engineering and Applications, 2021, 57(2): 170-176.

Route Planning Method Based on Least-Squares Policy Iteration for Unmanned Aerial Vehicle

基于最小二乘策略迭代的无人机航迹规划方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics