Self-Learning Gait Planning Method for Biped Robot Using DDPG

doi:10.3778/j.issn.1002-8331.1912-0382

Abstract

Abstract:

In order to solve the problem of high-dimensional nonlinear programming in walking control of a multi-degree-of-freedom biped robot, and to tap the autonomous motion potential of the biped robot in an uncertain environment, an improved biped robot step based on the Deep Deterministic Policy Gradient algorithm（DDPG） is proposed. The multi-joint DOF control problem of the two-joint robot is transformed into a non-linear function multi-objective optimization problem and solved by the DDPG algorithm. To solve the problem of slow convergence of the global approximation network solution process, Radial Basis Function（RBF） neural network is used for nonlinear function. The value is calculated and the gradient weighting algorithm is used to update the neural network weights, and SumTree is used to screen the quality samples. The simulation learning training of the biped robot is carried out through the joint simulation platform of ROS, Gazebo and Tensorflow. According to the data simulation, the time after the improved DDPG algorithm reaches the maximum cumulative reward is 45.7%, the success rate is also increased by 8.9%, and the joint posture angle after training is better.

Key words: biped robot, gait planning, Deep Deterministic Policy Gradient（DDPG）, Radial Basis Function（RBF） neural network, SumTree, Gazebo

摘要：

为解决多自由度双足机器人步行控制中高维非线性规划难题，挖掘不确定环境下双足机器人自主运动潜力，提出了一种改进的基于深度确定性策略梯度算法（DDPG）的双足机器人步态规划方案。把双足机器人多关节自由度控制问题转化为非线性函数的多目标优化求解问题，采用DDPG算法来求解。为解决全局逼近网络求解过程收敛慢的问题，采用径向基（RBF）神经网络进行非线性函数值的计算，并采用梯度下降算法更新神经网络权值，采用SumTree来筛选优质样本。通过ROS、Gazebo、Tensorflow的联合仿真平台对双足机器人进行了模拟学习训练。经数据仿真验证，改进后的DDPG算法平均达到最大累积奖励的时间提前了45.7%，成功率也提升了8.9%，且经训练后的关节姿态角度具有更好的平滑度。

关键词: 双足机器人, 步态规划, 深度确定性策略梯度算法（DDPG）, 径向基函数（RBF）神经网络, SumTree, Gazebo

ZHOU Youhang, ZHAO Hanyun, LIU Hanjiang, LI Yuze, XIAO Yuqin. Self-Learning Gait Planning Method for Biped Robot Using DDPG[J]. Computer Engineering and Applications, 2021, 57(6): 254-259.

周友行，赵晗妘，刘汉江，李昱泽，肖雨琴. 采用DDPG的双足机器人自学习步态规划方法[J]. 计算机工程与应用, 2021, 57(6): 254-259.

[1]	YANG Xueyu, CHEN Jianping, FU Qiming, LU You, WU Hongjie. Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method [J]. Computer Engineering and Applications, 2021, 57(19): 104-111.
[2]	LIU Yingqi1, GUAN Xiaorong1, LI Zhong1, ZOU Kaifan1, MAO Xiwang2. Research on Gait Planning and Simulation of Lower Limb Walking Aid Exoskeleton [J]. Computer Engineering and Applications, 2019, 55(11): 221-228.
[3]	SUN Tangle, LI Guohui. EEMD and RBF neural network prediction of sunspot monthly mean [J]. Computer Engineering and Applications, 2017, 53(24): 252-256.
[4]	LU Yanan, NAN Jingchang, GAO Mingming. RBF neural network for modeling based on improved parallel particle swarm optimization [J]. Computer Engineering and Applications, 2017, 53(14): 45-50.
[5]	TAN Qian1, JIANG Yi1, LIN Fan2. Load prediction of Swift cloud storage based on AHP-RBF [J]. Computer Engineering and Applications, 2014, 50(8): 35-39.
[6]	HE Zhi, TAN Jianhao. Research of biped robot’s walking gait control based on learning human [J]. Computer Engineering and Applications, 2014, 50(5): 234-238.
[7]	LI Chunguang1，2, LIU Guodong1. Research on natural ZMP reference trajectory for biped robot [J]. Computer Engineering and Applications, 2014, 50(19): 53-57.
[8]	CHEN Hongxing. Network intrusion detection based on neural network optimized by GA [J]. Computer Engineering and Applications, 2014, 50(14): 78-81.
[9]	CHEN Lei, ZHANG Guoliang, ZHANG Weiping, JING Bin. Dynamic gait planning of robot NAO [J]. Computer Engineering and Applications, 2014, 50(1): 267-270.
[10]	GUO Xiaoyan1, ZHANG Ming2. Method of personal credit evaluation of bank based on RBF neural network with weight [J]. Computer Engineering and Applications, 2013, 49(5): 258-262.
[11]	QIU Jiandong, JIANG Zhaoyuan. Research of railway freight volume prediction based on NLA-PSO-RBF [J]. Computer Engineering and Applications, 2013, 49(22): 253-257.
[12]	LI Guosheng1, WANG Lianhong1, DAI Yuxing1, WANG Xingxian2. Application of motor based on PSO and RBF neural network [J]. Computer Engineering and Applications, 2012, 48(31): 216-219.
[13]	PANG Zhen, XU Weihong. Learning algorithm for RBF neural networks based on improved k-means algorithm [J]. Computer Engineering and Applications, 2012, 48(11): 161-163.
[14]	LI Hui, GU Shenming. Approach to red tide prediction on RBF neural network [J]. Computer Engineering and Applications, 2012, 48(1): 228-230.
[15]	ZHOU Baomin1，LIAO Ying2. Adaptive improved RBF neural network sliding mode control for unknown nonlinear systems [J]. Computer Engineering and Applications, 2011, 47(9): 243-245.

Self-Learning Gait Planning Method for Biped Robot Using DDPG

采用DDPG的双足机器人自学习步态规划方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics