计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (30): 23-25.

• 博士论坛 • 上一篇    下一篇

基于蒙特卡罗学习的多机器人自组织协作

周 彤1,洪炳镕1,朴松昊1,周洪玉2   

  1. 1.哈尔滨工业大学 计算机科学与技术学院,哈尔滨 150001
    2.哈尔滨理工大学 机械动力学院,哈尔滨 150080
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-10-21 发布日期:2007-10-21
  • 通讯作者: 周 彤

Self-organizing coordination of multi-robot based on Monte Carlo learning

ZHOU Tong1,HONG Bing-rong1,PIAO Song-hao1,ZHOU Hong-yu2   

  1. 1.School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China
    2.School of Mechanical and Power Engineering,Harbin University of Science and Technology,Harbin 150080,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-10-21 Published:2007-10-21
  • Contact: ZHOU Tong

摘要: 强化学习是提高机器人完成任务效率的有效方法,目前比较流行的学习方法一般采用累积折扣回报方法,但平均值回报在某些方面更适于多机器人协作。累积折扣回报方法在机器人动作层次上可以提高性能,但在多机器人任务层次上却不会得到很好的协作效果,而采用平均回报值的方法,就可以改变这种状态。本文把基于平均值回报的蒙特卡罗学习应用于多机器人合作中,得到很好的学习效果,实际机器人实验结果表明,采用平均值回报的方法优于累积折扣回报方法。

Abstract: Reinforcement learning is an effective way for accomplishing task in multi-robot system.While much of the work has focused on optimizing discounted cumulative reward,optimizing average reward is sometimes a more suitable criterion for multi-robot coordination.Learning algorithms based on discounted rewards,such as Q learning,can attain a well result at the action-level,but it cannot perform well at the task-level.However,learning methods based on average reward,such as the Monte Carlo algorithm,are capable of achieving the optimal result through cooperation at the task-level.Real robot experiment shows that the algorithm adopting the average reward is superior to the one adopting the discounted cumulative reward.