动作预测在多机器人强化学习协作中的应用

计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (8): 257-260.

动作预测在多机器人强化学习协作中的应用

曹洁，朱宁宁

兰州理工大学计算机与通信学院，兰州 730050

出版日期:2013-04-15 发布日期:2013-04-15

Application of action prediction in multi-robot reinforcement learning cooperation

CAO Jie, ZHU Ningning

College of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China

Online:2013-04-15 Published:2013-04-15

摘要/Abstract

摘要： 在多机器人系统中，协作环境探索的强化学习的空间规模是机器人个数的指数函数，学习空间非常庞大造成收敛速度极慢。为了解决这个问题，将基于动作预测的强化学习方法及动作选择策略应用于多机器人协作研究中，通过预测机器人可能执行动作的概率以加快学习算法的收敛速度。实验结果表明，基于动作预测的强化学习方法能够比原始算法更快速地获取多机器人的协作策略。

关键词: 动作预测, 强化学习, 多机器人协作

Abstract: In multi-robot systems, the spatial scale of reinforcement learning of the cooperation environment exploration is made up of the exponential function of the number of robots. And the enormous learning space results in the slow convergence rate. To solve this problem, a prediction-based reinforcement learning algorithm and the action selection strategy are applied to the research on multi-robot cooperation. By predicting the probability of actions that other robots may execute, the convergence rate of this algorithm is accelerated. The experimental results show that reinforcement learning algorithm based-on action prediction can achieve the multi-robot’s cooperation strategy much faster, compared to the primitive algorithm.

Key words: action prediction, reinforcement learning, multi-robot cooperation

曹洁，朱宁宁. 动作预测在多机器人强化学习协作中的应用[J]. 计算机工程与应用, 2013, 49(8): 257-260.

CAO Jie, ZHU Ningning. Application of action prediction in multi-robot reinforcement learning cooperation[J]. Computer Engineering and Applications, 2013, 49(8): 257-260.

[1]	张鑫，张席. 优先状态估计的双深度Q网络[J]. 计算机工程与应用, 2021, 57(8): 78-83.
[2]	王晓，唐伦，贺小雨，陈前斌. 基于深度强化学习的服务功能链多维资源优化[J]. 计算机工程与应用, 2021, 57(4): 68-76.
[3]	赖俊，魏竞毅，陈希亮. 分层强化学习综述[J]. 计算机工程与应用, 2021, 57(3): 72-79.
[4]	马志豪，朱响斌. 拟双曲动量梯度的对抗深度强化学习研究[J]. 计算机工程与应用, 2021, 57(24): 90-99.
[5]	李宝帅，叶春明. 深度强化学习算法求解作业车间调度问题[J]. 计算机工程与应用, 2021, 57(23): 248-254.
[6]	王军，曹雷，陈希亮，赖俊，章乐贵. 多智能体博弈强化学习研究综述[J]. 计算机工程与应用, 2021, 57(21): 1-13.
[7]	成怡，郝密密. 改进深度强化学习的室内移动机器人路径规划[J]. 计算机工程与应用, 2021, 57(21): 256-262.
[8]	况立群，李思远，冯利，韩燮，徐清宇. 深度强化学习算法在智能军事决策中的应用[J]. 计算机工程与应用, 2021, 57(20): 271-278.
[9]	孔松涛，刘池池，史勇，谢义，王堃. 深度强化学习在智能制造中的应用展望综述[J]. 计算机工程与应用, 2021, 57(2): 49-59.
[10]	李浩，宁浩宇，康雁，梁文韬，霍雯. 针对文本情感转换的SMRFGAN模型[J]. 计算机工程与应用, 2021, 57(2): 170-176.
[11]	张荣霞，武长旭，孙同超，赵增顺. 深度强化学习及在路径规划中的研究进展[J]. 计算机工程与应用, 2021, 57(19): 44-56.
[12]	杨薛钰，陈建平，傅启明，陆悠，吴宏杰. 基于随机方差减小方法的DDPG算法[J]. 计算机工程与应用, 2021, 57(19): 104-111.
[13]	宋浩楠，赵刚，王兴芬. 融合知识表示和深度强化学习的知识推理方法[J]. 计算机工程与应用, 2021, 57(19): 189-197.
[14]	王科银，石振，杨正才，杨亚会，王思山. 改进强化学习算法应用于移动机器人路径规划[J]. 计算机工程与应用, 2021, 57(18): 270-274.
[15]	张俊，朱庆伟，严俊杰，温波. 改进强化学习算法的UAV室内三维航迹规划[J]. 计算机工程与应用, 2021, 57(16): 175-181.