Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (6): 254-259.DOI: 10.3778/j.issn.1002-8331.1912-0382

Previous Articles     Next Articles

Self-Learning Gait Planning Method for Biped Robot Using DDPG

ZHOU Youhang, ZHAO Hanyun, LIU Hanjiang, LI Yuze, XIAO Yuqin   

  1. School of Mechanical Engineering, Xiangtan University, Xiangtan, Hunan 411105, China
  • Online:2021-03-15 Published:2021-03-12

采用DDPG的双足机器人自学习步态规划方法

周友行,赵晗妘,刘汉江,李昱泽,肖雨琴   

  1. 湘潭大学 机械工程学院,湖南 湘潭 411105

Abstract:

In order to solve the problem of high-dimensional nonlinear programming in walking control of a multi-degree-of-freedom biped robot, and to tap the autonomous motion potential of the biped robot in an uncertain environment, an improved biped robot step based on the Deep Deterministic Policy Gradient algorithm(DDPG) is proposed. The multi-joint DOF control problem of the two-joint robot is transformed into a non-linear function multi-objective optimization problem and solved by the DDPG algorithm. To solve the problem of slow convergence of the global approximation network solution process, Radial Basis Function(RBF) neural network is used for nonlinear function. The value is calculated and the gradient weighting algorithm is used to update the neural network weights, and SumTree is used to screen the quality samples. The simulation learning training of the biped robot is carried out through the joint simulation platform of ROS, Gazebo and Tensorflow. According to the data simulation, the time after the improved DDPG algorithm reaches the maximum cumulative reward is 45.7%, the success rate is also increased by 8.9%, and the joint posture angle after training is better.

Key words: biped robot, gait planning, Deep Deterministic Policy Gradient(DDPG), Radial Basis Function(RBF) neural network, SumTree, Gazebo

摘要:

为解决多自由度双足机器人步行控制中高维非线性规划难题,挖掘不确定环境下双足机器人自主运动潜力,提出了一种改进的基于深度确定性策略梯度算法(DDPG)的双足机器人步态规划方案。把双足机器人多关节自由度控制问题转化为非线性函数的多目标优化求解问题,采用DDPG算法来求解。为解决全局逼近网络求解过程收敛慢的问题,采用径向基(RBF)神经网络进行非线性函数值的计算,并采用梯度下降算法更新神经网络权值,采用SumTree来筛选优质样本。通过ROS、Gazebo、Tensorflow的联合仿真平台对双足机器人进行了模拟学习训练。经数据仿真验证,改进后的DDPG算法平均达到最大累积奖励的时间提前了45.7%,成功率也提升了8.9%,且经训练后的关节姿态角度具有更好的平滑度。

关键词: 双足机器人, 步态规划, 深度确定性策略梯度算法(DDPG), 径向基函数(RBF)神经网络, SumTree, Gazebo