计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (6): 318-325.DOI: 10.3778/j.issn.1002-8331.2207-0159

• 工程与应用 • 上一篇    下一篇

多智能体强化学习的机械臂运动控制决策研究

羊波,王琨,马祥祥,范彪,徐磊,闫浩   

  1. 1.江南大学 机械工程学院,江苏 无锡 214122
    2.中国科学院 合肥智能机械研究所,合肥 230031
  • 出版日期:2023-03-15 发布日期:2023-03-15

Research on Motion Control Method of Manipulator Based on Reinforcement Learning

YANG Bo, WANG Kun, MA Xiangxiang, FAN Biao, XU Lei, YAN Hao   

  1. 1.School of Mechanical Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
    2.Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, China
  • Online:2023-03-15 Published:2023-03-15

摘要: 针对传统运动控算法存在环境适应性较差,效率低的问题。可以利用强化学习在环境中不断去探索试错,并通过奖励函数对神经网络参数进行调节的方法对机械臂的运动进行控制。但是在现实中无法提供机械臂试错的环境,采用Unity引擎平台来构建机械臂的数字孪生仿真环境,设置观察状态变量和设置奖励函数机制,并提出在该模型环境中对PPO(proximal policy optimization)与多智能体(agents)结合的M-PPO算法来加快训练速度,实现通过强化学习算法对机械臂进行智能运动控制,完成机械臂执行末端有效避障快速到达目标物体位置,并通过该算法与M-SAC(多智能体与Soft Actor-Critic结合)和PPO算法的实验结果进行分析,验证M-PPO算法在不同环境下机械臂运动控制决策调试上的有效性与先进性。实现孪生体自主规划决策,反向控制物理体同步运动的目的。

关键词: 强化学习, Unity引擎, 运动控制, M-PPO算法, 多智能体

Abstract: The traditional motion control algorithm has the problems of poor environmental adaptability and low efficiency. Reinforcement learning can be used to constantly explore trial and error in the environment, and the motion of the manipulator can be controlled by adjusting the neural network parameters through the reward function. However, in reality, it is impossible to provide a trial and error environment for the manipulator. This paper uses the Unity engine platform to build a digital twin simulation environment for the manipulator, set the observation state variables and set the reward function mechanism, and proposes the M-PPO algorithm combining PPO(proximal policy optimization) and multi-agent(agents) in this model environment to speed up the training speed and realize intelligent motion control of the manipulator through reinforcement learning algorithms. This paper completes the effective obstacle avoidance at the end of the manipulator’s execution and reach the target object’s position quickly, and also analyzes the experimental results of the algorithm, M-SAC(multi-agent and soft actor critical) and PPO algorithm. The effectiveness and progressiveness of M-PPO algorithm is verified in the debugging of the manipulator’s motion control decision under different environments. It achieves the purpose of independent planning and decision-making of twins and reverse control of synchronous movement of physical bodies.

Key words: reinforcement learning, Unity engine, motion control, M-PPO algorithm, multi-intelligence and agent