With the rise of indoor navigation and positioning technology, the application of Unmanned Aerial Vehicle（UAV） technology in indoor environments has been unprecedentedly developed, which puts forward higher requirements for UAV track planning ability. Due to the complexity of the indoor environmental space and the slow convergence rate of the existing reinforcement learning algorithms, this paper proposes an integrated method based on reinforcement learning. Firstly, the main obstacles and the nodes surrounding the main obstacles are judged through the starting and ending coordinate lines to reduce the space complexity. Secondly, in order to determine the direction of the target point and improve the convergence speed of the algorithm, the direction trend function is constructed through the mathematical relationship during the Q value initialization. Finally, the optimized algorithm is simulated and verified in three-dimensional grid map. The simulation results show that, compared with the standard Q-learning algorithm, the number of spatial search nodes of improved Q-learning algorithm is reduced by 55.49%, and the convergence time is shortened to 98.57%.

%U http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2004-0363