Mars Unmanned Aerial Vehicles Control with Deep Deterministic Policy Gradient

doi:10.3778/j.issn.1002-8331.2112-0528

Abstract

Abstract: In order to reduce the dependence of controller design on Mars unmanned aerial vehicle（UAV） dynamic models and improve the intelligence level of Mars UAV control system, a reinforcement learning-based controller for Mars UAV is proposed. The controller consists of neural networks and is trained by deep deterministic policy gradient（DDPG） algorithm. Finally, it obtains a control strategy to meet the control requirements according to current states and targets. The simulation results demonstrate that the controller based on DDPG is able to control the Mars UAV to a specified position autonomously without the derivation of UAV dynamic model. Mean-while, the performance such as control precision and adjustment time reaches the effect of proportion integration differentiation （PID） controller, which verifies the effectiveness of DDPG-based controller. In addition, when the controlled object model changes or there is external disturbance, the controller based on DDPG still completes the task stably, and the control effect is better than PID controller, indicating that the controller based on DDPG has good robustness.

Key words: Mars unmanned aerial vehicle（UAV）, reinforcement learning, autonomous control, deep deterministic policy gradient, strategy optimization

摘要： 为了降低控制器设计对火星无人机动力学模型的依赖，提高火星无人机控制系统的智能化水平，结合强化学习（reinforcement learning，RL）算法，提出了一种具有自主学习能力的火星无人机位置姿态控制器。该控制器由神经网络构成，利用深度确定性策略梯度（deep deterministic policy gradient，DDPG）算法进行学习，不断优化控制策略，最终获得满足控制要求的策略。仿真结果表明，在没有推导被控对象模型的前提下，基于DDPG算法的控制器通过学习，自主将火星无人机稳定控制到目标位置，且控制精度、调节时间等性能优于比例-积分-微分（proportion integration differentiation，PID）控制器的效果，验证了基于DDPG算法的控制器的有效性；此外，在被控对象模型改变或存在外部扰动的情况下，基于DDPG算法的控制器仍然能够稳定完成任务，控制效果优于PID控制器，表明基于DDPG算法的控制器具有良好的鲁棒性。

关键词: 火星无人机, 强化学习, 自主控制, 深度确定性策略梯度, 策略优化

SUN Dan, ZHENG Jianhua, GAO Dong, HAN Peng. Mars Unmanned Aerial Vehicles Control with Deep Deterministic Policy Gradient[J]. Computer Engineering and Applications, 2023, 59(8): 288-296.

孙丹, 郑建华, 高东, 韩鹏. 深度确定性策略梯度学习的火星无人机控制[J]. 计算机工程与应用, 2023, 59(8): 288-296.

References

[1] 赵鹏越，全齐全，邓宗全，等.旋翼式火星无人机技术发展综述[J].宇航学报，2018，39（2）：121-130.
ZHAO P Y，QUAN Q Q，DENG Z Q，et al.Overview of research on rotary-wing Mars unmanned aerial vehicles[J].Journal of Astronautics，2018，39（2）：121-130.
[2] NOORDIN A，BASRI M A M，MOHAMED Z，et al.Modelling and PSO fine-tuned PID control of quadrotor UAV[J].International Journal on Advanced Science，Engineering and Information Technology，2017，7（4）：1367-1373.
[3] SEYEDTABAII S.New flat phase margin fractional order PID design：perturbed UAV roll control study[J].Robotics and Autonomous Systems，2017，96：58-64.
[4] ZHEN Z，TAO G，YU C，et al.A multivariable adaptive control scheme for automatic carrier landing of UAV[J].Aerospace Science and Technology，2019，92：714-721.
[5] XU B.Composite learning finite-time control with application to quadrotors[J].IEEE Transactions on Systems，Man，and Cybernetics：Systems，2018，48（10）：1806-1815.
[6] BESNARD L，SHTESSEL Y B，LANDRUM B.Control of a quadrotor vehicle using sliding mode disturbance observer[C]//26th American Control Conference，New York，July 9-13，2007.New York：IEEE，2007：5230-5235.
[7] FALCONI R，MELCHIORRI C.Dynamic model and control of an over-actuated quadrotor UAV[J].IFAC Proceedings Volumes，2012，45（22）：192-197.
[8] RAMIREZ-RODRIGUEZ H，PARRA-VEGA V，SANCHEZ-ORTA A，et al.Robust backstepping control based on integral sliding modes for tracking of quadrotors[J].Journal of Intelligent & Robotic Systems，2013，73：51-66.
[9] WANG H，LI Z，XIONG H，et al.Robust H∞ attitude tracking control of a quadrotor UAV on SO （3） via variation-based linearization and interval matrix approach[J].ISA Transactions，2019，87：10-16.
[10] TAVAKOL F，BINAZADEH T.Robust control design for path tracking of non-affine UAV[J].Systems Science & Control Engineering，2017，5（1）：474-480.
[11] GRIP H F，JOHNSON W，MALPICA C，et al.Modeling and identification of hover flight dynamics for NASA’s Mars helicopter[J].Journal of Guidance，Control，and Dynamics，2020，43（2）：179-194.
[12] TERZE Z，PAND?A V，KASALO M，et al.Discrete mechanics and optimal control optimization of flapping wing dynamics for Mars exploration[J].Aerospace Science and Technology，2020，106：106131-106142.
[13] WASLANDER S L，HOFFMANN G M，JANG J S，et al.Multi-agent quadrotor testbed control design：integral sliding mode vs. reinforcement learning[C]//2005 IEEE/RSJ International Conference on Intelligent Robots and Systems，Edmonton，August 2-6，2005.New York：IEEE，2005：468-473.
[14] NG A Y，COATES A，DIEL M，et al.Autonomous inverted helicopter flight via reinforcement learning[C]//Springer Tracts in Advanced Robotics，Singapore，June 18-21，2004.Berlin：Springer，2006：363-372.
[15] ABBEEL P.Apprenticeship learning and reinforcement learning with application to robotic control[D].Palo Alto：Stanford University，2008.
[16] ABBEEL P，COATES A，QUIGLEY M，et al.An application of reinforcement learning to aerobatic helicopter flight[C]//Advances in Neural Information Processing Systems，2007：1-8.
[17] ROTTMANN A，PLAGEMANN C，HILGERS P，et al.Autonomous blimp control using model-free reinforcement learning in a continuous state and action space[C]//2007 IEEE/RSJ International Conference on Intelligent Robots and Systems，San Diego，October 29-November 2，2007.New York：IEEE，2007：1895-1900.
[18] ZHANG T，KAHN G，LEVINE S，et al.Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search[C]//IEEE International Conference on Robotics and Automation（ICRA），Stockholm，May 16-21，2016.New York：IEEE，2016：528-535.
[19] RODRIGUEZ-RAMOS A，SAMPEDRO C，BAVLE H，et al.A deep reinforcement learning strategy for UAV autonomous landing on a moving platform[J].Journal of Intelligent & Robotic Systems，2019，93（1/2）：351-366.
[20] SANCHEZ-RIVERA L M，LOZANO R，ARIAS-MONTANO A.Development，modeling and control of a dual tilt-wing UAV in vertical flight[J].Drones，2020，4（4）：71-85.
[21] 石征锦，宫政伟，赵方昕，等.共轴双旋翼飞行器建模及纵向姿态控制优化[J].航天控制，2017，35（3）：24-29.
SHI Z J，GONG Z W，ZHAO F X，et al.Coaxial double rotor aircraft model and optimization of longitudinal attitude control[J].Aerospace Control，2017，35（3）：24-29.
[22] SUTTON R S，BARTO A G.Reinforcement learning：an introduction[M].2nd ed.Cambridge，MA：MIT Press，2018：47-62.
[23] SILVER D，LEVER G，HEESS N，et al.Deterministic policy gradient algorithms[C]//Proceedings of the 31st International Conference on Machine Learning，Beijing，June 21-26，2014.New York：ACM，2014：1-9.
[24] LILLICRAP T P，HUNT J J，PRITZEL A，et al.Continuous control with deep reinforcement learning[C]//International Conference on Learning Representations，San Juan，May 2-4，2016.
[25] PLAPPERT M，HOUTHOOFT R，DHARIWAL P，et al.Parameter space noise for exploration[C]//6th International Conference on Learning Representations，Vancouver，April 30-May 3，2018：1-8.