计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (16): 175-181.DOI: 10.3778/j.issn.1002-8331.2004-0363

• 模式识别与人工智能 • 上一篇    下一篇

改进强化学习算法的UAV室内三维航迹规划

张俊,朱庆伟,严俊杰,温波   

  1. 1.西安科技大学 测绘科学与技术学院,西安 710054
    2.西南大学 心理学部,重庆 400715
  • 出版日期:2021-08-15 发布日期:2021-08-16

UAV Indoor 3D Track Planning Based on Improved Reinforcement Learning Algorithm

ZHANG Jun, ZHU Qingwei, YAN Junjie, WEN Bo   

  1. 1.College of Geometrics, Xi’an University of Science and Technology, Xi’an 710054, China
    2.Department of Psychology, Southwest University, Chongqing 400715, China
  • Online:2021-08-15 Published:2021-08-16

摘要:

随着室内导航定位技术的兴起,无人机(Unmanned Aerial Vehicle,UAV)技术在室内环境中的应用得到前所未有的发展,对无人机航迹规划能力提出了更高的要求。由于室内环境空间较为复杂,且现有的强化学习算法收敛速度慢,提出一种基于强化学习的集成方法。通过给定的起点和终点位置的坐标连线,判断出主要障碍物及围绕主要障碍物的节点,减少无用节点的搜索;在Q值初始化过程中通过数学关系构造出方向趋向函数,确定出目标点所在的方向,以提高算法的收敛速度;在三维栅格地图中对优化算法进行仿真验证。仿真结果表明:改进的三维航迹规划算法使得空间搜索节点数目减少了55.49%,收敛时间缩短了98.57%。

关键词: 航迹规划, 目标方向, 主要障碍物和围绕点(MO-SP), 无人机(UAV), 强化学习

Abstract:

With the rise of indoor navigation and positioning technology, the application of Unmanned Aerial Vehicle(UAV) technology in indoor environments has been unprecedentedly developed, which puts forward higher requirements for UAV track planning ability. Due to the complexity of the indoor environmental space and the slow convergence rate of the existing reinforcement learning algorithms, this paper proposes an integrated method based on reinforcement learning. Firstly, the main obstacles and the nodes surrounding the main obstacles are judged through the starting and ending coordinate lines to reduce the space complexity. Secondly, in order to determine the direction of the target point and improve the convergence speed of the algorithm, the direction trend function is constructed through the mathematical relationship during the Q value initialization. Finally, the optimized algorithm is simulated and verified in three-dimensional grid map. The simulation results show that, compared with the standard Q-learning algorithm, the number of spatial search nodes of improved Q-learning algorithm is reduced by 55.49%, and the convergence time is shortened to 98.57%.

Key words: track planning, target direction, Main Obstacles and Surrounding Point(MO-SP), Unmanned Aerial Vehicle(UAV), reinforcement learning