计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (16): 106-115.DOI: 10.3778/j.issn.1002-8331.2501-0027

• 理论与研发 • 上一篇    下一篇

结构化环境中UR3机械臂对于移动物体的抓取研究

罗国庆,袁庆霓,曲鹏举,吴兴杰   

  1. 贵州大学 现代制造技术教育部重点实验室,贵阳 550025
  • 出版日期:2025-08-15 发布日期:2025-08-15

Grasp of Moving Objects by UR3 Manipulator in Structured Environment

LUO Guoqing, YUAN Qingni, QU Pengju, WU Xingjie   

  1. Key Laboratory of Advanced Manufacturing Technology of the Ministry of Education, Guizhou University, Guiyang 550025, China
  • Online:2025-08-15 Published:2025-08-15

摘要: 针对在结构化环境下机械臂移动抓取自主决策能力不足、环境适应性低、学习效率低以及机械臂协同作业中的协调性和路径规划等问题,将双重回放缓存机制(double replay buffer,DRB)与强化学习柔性动作-评价(soft actor-critic,SAC)算法融合,提出基于DRB-SAC(soft actor-critic with double replay buffer)的深度强化学习移动物体机械臂抓取方法。首先搭建移动物体机械臂抓取系统。然后提出改进的深度强化学习控制策略DRB-SAC。该策略通过马尔可夫决策过程模型构建,定义动作空间和状态空间提供操作和观察环境的框架,确定机械臂控制任务的目标和约束条件,设计训练策略和奖励函数,利用深度神经网络来拟合动作价值函数和策略函数,实现机械臂移动自适应抓取智能决策,并引入双重回放缓存机制进一步增强了算法的稳定性和泛化能力。最后进行仿真和物理实验对比分析,结果表明该方法具有很好的收敛性,在探索环境奖励、抓取动作的完成程度方面具有优越性。

关键词: 强化学习, 移动物体抓取, 双重回放缓存机制, CoppeliaSim仿真

Abstract: In response to the problems of sufficient autonomous decision-making ability for the manipulator movement and grasping in non-structured environments, low adaptability to the environment, low learning efficiency, and poor coordination and path planning in the coordination of manipulators, this paper proposes a novel method to object-grasping by integrating the double replay buffers (DRB) and reinforcement learning soft actor-critic (SAC) algorithm. The proposed method involves setting up a moving object-grasping robotic arm system, introducing an improved deep reinforcement learning control strategy called DRB-SAC, defining action space and state space by using a Markov decision process model, designing training strategy and reward function, utilizing deep neural networks to fit the action value function and policy function for adaptive intelligent decision-making in robotic arm movement, and enhancing stability and generalization ability through the double replay buffers mechanism. Comparative analysis of simulation results and physical experiments demonstrates that this method exhibits good convergence and superiority in exploring environmental rewards as well as completing grasping actions.

Key words: reinforcement learning, moving object grasp, double replay buffer, Coppeliasim simulation