基于半自治agent的profit-sharing增强学习方法研究

计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (15): 72-75.

基于半自治agent的profit-sharing增强学习方法研究

杨克巍，张少丁，岑凯辉，谭跃进

国防科技大学信息系统与管理学院，长沙 410073

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-05-21 发布日期:2007-05-21
通讯作者: 杨克巍

Research of profit-sharing reinforcement learning method based on semi-autonomous agent

YANG Ke-wei，ZHANG Shao-ding，CEN Kai-hui，TAN Yue-jin

School of Information and Management，National University of Defense Technology，Changsha 410073，China

Received:1900-01-01 Revised:1900-01-01 Online:2007-05-21 Published:2007-05-21
Contact: YANG Ke-wei

摘要/Abstract

摘要： 在基于半自治agent的系统中应用profit-sharing增强学习方法，并与基于动态规划的Q-learning 增强学习方法进行比较，在不确定因素较多的动态环境中，当系统状态变化不是一个马尔科夫过程时profit-sharing方法具有很大优势。根据半自治agent中半自治的特性——受制性，提出了一种面向基于半自治agent的增强学习模型，以战场仿真中安全隐蔽的寻找模型为实例对基于半自治agent的profit-sharing增强学习模型进行了试验分析。

关键词: 增强学习, 半自治agent, profit-sharing, Q-learning

Abstract: We exert the profit-sharing reinforcement learning method into the semi-autonomous agent system，and compare it with the other reinforce learning method——Q-learning.Profit-sharing method is more robust and fit for the dynamic environment which includes many uncertain factors，especially in the partial MDPs（Markov Decision Processes） environment.Facing the semi -autonomous property of the agent，we propose an improving learning method of profit-sharing in the semi-autonomous agent system and test it in a combat simulation environment that finds the safety hidden space in battlefield.At last we contract and analyze these methods to the others.

Key words: reinforcement learning, semi-autonomous agent, profit-sharing, Q-learning

杨克巍，张少丁，岑凯辉，谭跃进. 基于半自治agent的profit-sharing增强学习方法研究[J]. 计算机工程与应用, 2007, 43(15): 72-75.

YANG Ke-wei，ZHANG Shao-ding，CEN Kai-hui，TAN Yue-jin. Research of profit-sharing reinforcement learning method based on semi-autonomous agent[J]. Computer Engineering and Applications, 2007, 43(15): 72-75.

[1]	季挺，张华. 基于CMAC的非参数化近似策略迭代增强学习[J]. 计算机工程与应用, 2019, 55(2): 128-136.
[2]	陈春晓1，陈治亚1，2，陈维亚1. 基于多智能体增强学习的公交驻站控制方法[J]. 计算机工程与应用, 2015, 51(17): 8-13.
[3]	孙天昊，邓俊昆，陈飞，朱庆生. 基于增强学习协商策略的研究及优化[J]. 计算机工程与应用, 2012, 48(23): 44-46.
[4]	郑明，王玲娟，蔚承建. 用多Agent系统分配具有启动成本的有限资源[J]. 计算机工程与应用, 2010, 46(4): 219-222.
[5]	孙天昊,朱庆生,李双庆,周明强. 一种优化的基于增强学习协商策略[J]. 计算机工程与应用, 2008, 44(30): 24-25.
[6]	修国明,张积滨,潘启树. 基于实例的POMDP问题的近似求解[J]. 计算机工程与应用, 2008, 44(29): 82-85.
[7]	刘国栋,杨宝庆. 多智能体的增强学习及其在RoboCup中的应用[J]. 计算机工程与应用, 2008, 44(23): 46-48.