计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (15): 7-11.

• 博士论坛 • 上一篇    下一篇

基于一阶信念点的一阶POMDP值迭代算法研究

陈丽娜,黄宏斌,邓  苏   

  1. 国防科技大学 信息系统工程重点实验室,长沙 410073
  • 出版日期:2012-05-21 发布日期:2012-05-30

Research on first-order belief point-based value iteration for FO-POMDP

CHEN Lina, HUANG Hongbin, DENG Su   

  1. Key Lab of Information System Engineering, National University of Defense Technology, Changsha 410073, China
  • Online:2012-05-21 Published:2012-05-30

摘要: 主要研究一阶部分可观测马尔可夫决策过程的近似求解方法。给出了一阶信念、一阶信念粒度、流关键度的概念;提出了基于流关键度的粒度归结方法,统一一阶信念粒度;提出了一阶信念粒度距离度量方法,提出FO-PBVI方法,将PBVI提升到抽象层面。通过Tiger和Tag实验对方法进行了验证分析,通过实验可见FO-PBVI方法能够很好地适应问题规模的变化,能够求解较大规模的规划问题。

关键词: 一阶部分可观测马尔可夫决策过程(POMDP), 一阶信念状态, 粒度归结, 值迭代

Abstract: The approximate algorithm of FO-POMDP is an important problem. This paper studies the approximate algorithm of FO-POMDP. The concepts of the first-order belief state, the granularity of belief state, and the degree of fluent are proposed. The method of granularity resolution is presented which can convert the granularity of belief states. The distance of different first-order belief states is also presented. The PBVI is extended to the logic level, and it is FO-PBVI. Experiments on FO-PBVI show that, FO-PBVI is efficient in solving the problems whose scale is large.

Key words: First Order-Partially-Observable Markov Decision Processes(FO-POMDP), First Order(FO)-belief state, granularity resolution, value iteration