Factorized Asymptotic Bayesian Policy Search for POMDPs
Authors: Masaaki Imaizumi, Ryohei Fujimaki
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that FABPS outperforms state-of-the-art model selection methods for POMDPs, with respect both to model selection and to expected total rewards.Our experiments, on simulation and helicopter data, show that FABPS outperforms state-of-the-art POMDP model selection methods about both model selection and total rewards. |
| Researcher Affiliation | Collaboration | Masaaki Imaizumi Institute of Statistical Mathematics insou11@hotmail.com Ryohei Fujimaki NEC Corporation rfujimaki@nec-labs.com |
| Pseudocode | Yes | Algorithm 1 FABPS algorithm for selecting K |
| Open Source Code | No | The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available, in supplementary material, or will be released. |
| Open Datasets | Yes | We used data for helicopter control, provided by [Abbeel et al., 2010]. |
| Dataset Splits | No | The paper mentions generating 'training data sequences' and using 'helicopter data' (with a citation to [Abbeel et al., 2010]), but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) used for implementation or experimentation. |
| Experiment Setup | Yes | We set a POMDP model as follows by following a similar manner with the data generation of MDP models in [Ueno et al., 2012]. The latent state space is S = {1, 2, . . . , K} and the true K is 3. The action takes any real number : A = R, and sets the transition probability as follows: p(s|s , a ) = λk where s is the j-th nearest to s + a with PK j=1 λj = 1. The reward function is r(s, a) = s2 a2 + 10. We allow the policy to be heterogeneous : π(a|b, θ) = PK k=1 b(sk)N(a|θk,1, θ2 k,2), where N is a density function of Gaussian distribution. We set the synthetic data are generated from the policy function, and true parameter values are (θ1,1, θ1,2, θ2,1, θ2,2, θ3,1, θ3,2) = (0.5, 1.0, 1.5, 1.5, 1.0, 0.5). The data size is set to N T = 1500. |