基于蒙特卡罗学习的多机器人自组织协作

计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (30): 23-25.

基于蒙特卡罗学习的多机器人自组织协作

周彤¹,洪炳镕¹,朴松昊¹,周洪玉²

1.哈尔滨工业大学计算机科学与技术学院,哈尔滨 150001
2.哈尔滨理工大学机械动力学院,哈尔滨 150080

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-10-21 发布日期:2007-10-21
通讯作者: 周彤

Self-organizing coordination of multi-robot based on Monte Carlo learning

ZHOU Tong¹,HONG Bing-rong¹,PIAO Song-hao¹,ZHOU Hong-yu²

1.School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China
2.School of Mechanical and Power Engineering,Harbin University of Science and Technology,Harbin 150080,China

Received:1900-01-01 Revised:1900-01-01 Online:2007-10-21 Published:2007-10-21
Contact: ZHOU Tong

摘要/Abstract

摘要： 强化学习是提高机器人完成任务效率的有效方法,目前比较流行的学习方法一般采用累积折扣回报方法,但平均值回报在某些方面更适于多机器人协作。累积折扣回报方法在机器人动作层次上可以提高性能,但在多机器人任务层次上却不会得到很好的协作效果,而采用平均回报值的方法,就可以改变这种状态。本文把基于平均值回报的蒙特卡罗学习应用于多机器人合作中,得到很好的学习效果,实际机器人实验结果表明,采用平均值回报的方法优于累积折扣回报方法。

Abstract: Reinforcement learning is an effective way for accomplishing task in multi-robot system.While much of the work has focused on optimizing discounted cumulative reward,optimizing average reward is sometimes a more suitable criterion for multi-robot coordination.Learning algorithms based on discounted rewards,such as Q learning,can attain a well result at the action-level,but it cannot perform well at the task-level.However,learning methods based on average reward,such as the Monte Carlo algorithm,are capable of achieving the optimal result through cooperation at the task-level.Real robot experiment shows that the algorithm adopting the average reward is superior to the one adopting the discounted cumulative reward.

周彤¹,洪炳镕¹,朴松昊¹,周洪玉². 基于蒙特卡罗学习的多机器人自组织协作[J]. 计算机工程与应用, 2007, 43(30): 23-25.

ZHOU Tong¹,HONG Bing-rong¹,PIAO Song-hao¹,ZHOU Hong-yu². Self-organizing coordination of multi-robot based on Monte Carlo learning[J]. Computer Engineering and Applications, 2007, 43(30): 23-25.