多智能体的增强学习及其在RoboCup中的应用

doi:10.3778/j.issn.1002-8331.2008.23.014

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (23): 46-48.DOI: 10.3778/j.issn.1002-8331.2008.23.014

多智能体的增强学习及其在RoboCup中的应用

刘国栋,杨宝庆

江南大学控制科学与工程研究中心，江苏无锡 214122

收稿日期:2007-10-18 修回日期:2008-01-21 出版日期:2008-08-11 发布日期:2008-08-11
通讯作者: 刘国栋

Reinforcement learning for Multi-Agents Systems and its application in RoboCup

LIU Guo-dong,YANG Bao-qing

School of Communication and Control Engineering，Jiangnan University，Wuxi，Jiangsu 214122，China

Received:2007-10-18 Revised:2008-01-21 Online:2008-08-11 Published:2008-08-11
Contact: LIU Guo-dong

摘要/Abstract

摘要： 针对非确定马尔可夫环境下的多智能体系统，提出了多智能体Q学习模型和算法。算法中通过对联合动作的统计来学习其它智能体的行为策略，并利用智能体策略向量的全概率分布保证了对联合最优动作的选择。在实验中，成功实现了智能体的决策，提高了AFU队的整体的对抗能力，证明了算法的有效性和可行性。

关键词: 多智能体, 增强学习, 机器人世界杯足球锦标赛

Abstract: Due to the presence of other agents，the environment of Multi-Agent Systems（MAS） cannot be simply treated as Markov Decision Processes（MDPs）.The current reinforcement learning which are based on MDPs must be reformed before it can be applicable to MAS.Based on an agent’s independent learning ability，this paper proposes a novel Q-learning algorithm for MAS-an agent learning other agents action policies through observing the joint action.The politicies of other agents are expressed as action probability distribution matrixes.A concise and yet useful updating method for the matrixes is proposed.The full joint probability of distribution matrixes guarantees the learning agent to choose its optimal action.In experiment，the implemention of the agent and the enhancement of AFU shows that the approach is valid and efficient.

Key words: Multi-Agents Systems（MAS）, reinforcement learning, Robot World Cup（RoboCup）

刘国栋,杨宝庆. 多智能体的增强学习及其在RoboCup中的应用[J]. 计算机工程与应用, 2008, 44(23): 46-48.

LIU Guo-dong,YANG Bao-qing. Reinforcement learning for Multi-Agents Systems and its application in RoboCup[J]. Computer Engineering and Applications, 2008, 44(23): 46-48.

[1]	陈世明，林子朋，高彦丽，裴惠琴. 自适应耦合权重下的异质群体一致性研究[J]. 计算机工程与应用, 2021, 57(4): 231-235.
[2]	王军，曹雷，陈希亮，赖俊，章乐贵. 多智能体博弈强化学习研究综述[J]. 计算机工程与应用, 2021, 57(21): 1-13.
[3]	李振涛，冯元珍，王正新. 事件触发下多智能体系统固定时间二分一致性[J]. 计算机工程与应用, 2021, 57(21): 80-86.
[4]	况立群，李思远，冯利，韩燮，徐清宇. 深度强化学习算法在智能军事决策中的应用[J]. 计算机工程与应用, 2021, 57(20): 271-278.
[5]	孙彧，曹雷，陈希亮，徐志雄，赖俊. 多智能体深度强化学习研究综述[J]. 计算机工程与应用, 2020, 56(5): 13-24.
[6]	茆汉国，张建德. 多智能体系统的非震颤固定时间一致性[J]. 计算机工程与应用, 2020, 56(4): 158-162.
[7]	陈良康，过榴晓，杨永清. 带有智能领导者的网络系统分群投影一致性[J]. 计算机工程与应用, 2020, 56(19): 42-47.
[8]	许元云，何明，刘锦涛，周波，杨铖. 碰撞锥检测改进的多智能体避障算法[J]. 计算机工程与应用, 2020, 56(18): 63-68.
[9]	王梦娇，尹翔，黄宁馨. 基于迁移学习的多任务分配算法[J]. 计算机工程与应用, 2020, 56(13): 150-155.
[10]	季挺，张华. 基于CMAC的非参数化近似策略迭代增强学习[J]. 计算机工程与应用, 2019, 55(2): 128-136.
[11]	王丽丽1，2，刘昕3. 电动网约车充电站布局优化研究#br#[J]. 计算机工程与应用, 2019, 55(14): 228-234.
[12]	冯元珍，刘敏. 具有时滞的混合阶多智能体系统的组一致性[J]. 计算机工程与应用, 2019, 55(12): 67-71.
[13]	李杨，徐峰，谢光强，黄向龙. 多智能体技术发展及其应用综述[J]. 计算机工程与应用, 2018, 54(9): 13-21.
[14]	梁嘉琪，卜旭辉，刘建. 数据丢失下多智能体系统迭代学习跟踪控制[J]. 计算机工程与应用, 2018, 54(20): 42-47.
[15]	李玲，胡爱花，高海云. 事件触发控制多智能体网络点对点一致性[J]. 计算机工程与应用, 2018, 54(17): 50-55.

多智能体的增强学习及其在RoboCup中的应用

Reinforcement learning for Multi-Agents Systems and its application in RoboCup

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics