分层的局部合作Q-学习

doi:10.3778/j.issn.1002-8331.2009.22.003

计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (22): 7-9.DOI: 10.3778/j.issn.1002-8331.2009.22.003

分层的局部合作Q-学习

刘亮，李龙澍

安徽大学计算智能与信号处理教育部重点实验室，合肥 230039

收稿日期:2009-04-07 修回日期:2009-05-14 出版日期:2009-08-01 发布日期:2009-08-01
通讯作者: 刘亮

Hierarchical regional cooperative Q-learning

LIU Liang，LI Long-shu

Key Lab of Intelligent Computing & Signal Processing，Ministry of Education，Anhui University，Hefei 230039，China

Received:2009-04-07 Revised:2009-05-14 Online:2009-08-01 Published:2009-08-01
Contact: LIU Liang

摘要/Abstract

摘要： 多智能体Q-学习问题往往因为联合动作的个数指数级增长而变得无法解决。从研究分层强化学习入手，通过对强化学习中合作MAS的研究，在基于系统工作逻辑的研究基础上，提出了基于学习过程分层的局部合作强化学习，通过对独立Agent强化学习的知识考察，改进多Agent系统学习的效率，进一步提高了局部合作强化学习的效能。从而解决强化学习中的状态空间的维数灾难，并通过仿真足球的2vs1防守证明了算法的有效性。

关键词: 多智能体系统, 局部合作, Q-学习, 过程分层

Abstract: Many multi-agent Q-learning problems can not be solved because of the number of joint actions is exponential in the number of agents.Based on the study of the cooperation in MAS in reinforcement learning and on the basis of the research in the system logic，this paper puts forward the hierarchical regional cooperation reinforcement learning based on learning process.By studying the knowledge of Agent reinforcement learning and improving the multi-Agent study efficiency，the performance of the regional cooperation reinforcement learning is further enhanced，combining with the mission action based on joint action and potential field model so as to solve the dimensional disaster in state space of reinforcement learning.This algorithm is used in a subtask of robot soccer and its effectiveness is validated by experiments.

Key words: Multi-Agent Systems（MAS）, regional cooperative, Q-learning, process stratification

刘亮，李龙澍. 分层的局部合作Q-学习[J]. 计算机工程与应用, 2009, 45(22): 7-9.

LIU Liang，LI Long-shu. Hierarchical regional cooperative Q-learning[J]. Computer Engineering and Applications, 2009, 45(22): 7-9.

[1]	陈世明，林子朋，高彦丽，裴惠琴. 自适应耦合权重下的异质群体一致性研究[J]. 计算机工程与应用, 2021, 57(4): 231-235.
[2]	李振涛，冯元珍，王正新. 事件触发下多智能体系统固定时间二分一致性[J]. 计算机工程与应用, 2021, 57(21): 80-86.
[3]	孙彧，曹雷，陈希亮，徐志雄，赖俊. 多智能体深度强化学习研究综述[J]. 计算机工程与应用, 2020, 56(5): 13-24.
[4]	陈良康，过榴晓，杨永清. 带有智能领导者的网络系统分群投影一致性[J]. 计算机工程与应用, 2020, 56(19): 42-47.
[5]	王梦娇，尹翔，黄宁馨. 基于迁移学习的多任务分配算法[J]. 计算机工程与应用, 2020, 56(13): 150-155.
[6]	冯元珍，刘敏. 具有时滞的混合阶多智能体系统的组一致性[J]. 计算机工程与应用, 2019, 55(12): 67-71.
[7]	李杨，徐峰，谢光强，黄向龙. 多智能体技术发展及其应用综述[J]. 计算机工程与应用, 2018, 54(9): 13-21.
[8]	梁嘉琪，卜旭辉，刘建. 数据丢失下多智能体系统迭代学习跟踪控制[J]. 计算机工程与应用, 2018, 54(20): 42-47.
[9]	邱丽，过榴晓. 事件触发下随机非确定线性多智能体的指数同步[J]. 计算机工程与应用, 2018, 54(17): 141-145.
[10]	黄红伟1，黄天民2. 事件触发机制下的多智能体领导跟随一致性[J]. 计算机工程与应用, 2017, 53(6): 29-33.
[11]	李昆1，郑柏超1，2，钟露1. 不确定多智能体系统的鲁棒量化一致性研究[J]. 计算机工程与应用, 2017, 53(24): 48-54.
[12]	王世丽，金英花，吴晨. 带通信时滞的多智能体系统的群集运动[J]. 计算机工程与应用, 2017, 53(23): 24-28.
[13]	赵蕊，朱美玲，徐勇. 多智能体系统自适应跟踪控制[J]. 计算机工程与应用, 2017, 53(18): 39-43.
[14]	程玉娟，俞辉. 多智能体切换网络自适应组一致性[J]. 计算机工程与应用, 2017, 53(11): 50-55.
[15]	刘丹，胡爱花，邵浩宇. 自适应事件触发控制的多智能体系统一致性[J]. 计算机工程与应用, 2017, 53(1): 44-48.

分层的局部合作Q-学习

Hierarchical regional cooperative Q-learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics