计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (2): 226-232.DOI: 10.3778/j.issn.1002-8331.1810-0021

• 工程与应用 • 上一篇    下一篇

面向轨迹规划的深度强化学习奖励函数设计

李跃,邵振洲,赵振东,施智平,关永   

  1. 1.首都师范大学 信息工程学院,北京 100048
    2.首都师范大学 轻型工业机械臂与安全验证北京市重点实验室,北京 100048
    3.首都师范大学 成像技术北京市高精尖创新中心,北京 100048
  • 出版日期:2020-01-15 发布日期:2020-01-14

Design of Reward Function in Deep Reinforcement Learning for Trajectory Planning

LI Yue, SHAO Zhenzhou, ZHAO Zhendong, SHI Zhiping, GUAN Yong   

  1. 1.College of Information Engineering, Capital Normal University, Beijing 100048, China
    2.Beijing Key Laboratory of Light Industrial Robot and Safety Verification, Capital Normal University, Beijing 100048, China
    3.Beijing Advanced Innovation Center for Imaging Technology, Capital Normal University, Beijing 100048, China
  • Online:2020-01-15 Published:2020-01-14

摘要: 现有基于深度强化学习的机械臂轨迹规划方法在未知环境中学习效率偏低,规划策略鲁棒性差。为了解决上述问题,提出了一种基于新型方位奖励函数的机械臂轨迹规划方法A-DPPO,基于相对方向和相对位置设计了一种新型方位奖励函数,通过降低无效探索,提高学习效率。将分布式近似策略优化(DPPO)首次用于机械臂轨迹规划,提高了规划策略的鲁棒性。实验证明相比现有方法,A-DPPO有效地提升了学习效率和规划策略的鲁棒性。

关键词: 深度强化学习, 机械臂, 轨迹规划, 方位奖励函数

Abstract: For the trajectory planning of robot manipulator in unknown environments, current deep reinforcement learning based?methods often suffer from the low learning efficiency and low robustness of planning strategy. To overcome the problems above, a novel azimuth reward function based trajectory planning method called A-DPPO is proposed. A novel azimuth reward function based on relative orientation and relative position is designed to reduce the invalid explorations and improve the learning efficiency. Moreover, it is the first time that Distributed Proximal Policy Optimization(DPPO) is applied to the trajectory planning for robot manipulator to improve the robustness of planning strategy. Experimental results show that the proposed A-DPPO method can increase the learning efficiency, compared to the state-of-the-art methods, and improve the robustness of planning strategy greatly.

Key words: deep reinforcement learning, robot manipulator, trajectory planning, azimuth reward function