Factorized Asymptotic Bayesian Policy Search for POMDPs

Authors: Masaaki Imaizumi, Ryohei Fujimaki

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that FABPS outperforms state-of-the-art model selection methods for POMDPs, with respect both to model selection and to expected total rewards.Our experiments, on simulation and helicopter data, show that FABPS outperforms state-of-the-art POMDP model selection methods about both model selection and total rewards.
Researcher Affiliation Collaboration Masaaki Imaizumi Institute of Statistical Mathematics insou11@hotmail.com Ryohei Fujimaki NEC Corporation rfujimaki@nec-labs.com
Pseudocode Yes Algorithm 1 FABPS algorithm for selecting K
Open Source Code No The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available, in supplementary material, or will be released.
Open Datasets Yes We used data for helicopter control, provided by [Abbeel et al., 2010].
Dataset Splits No The paper mentions generating 'training data sequences' and using 'helicopter data' (with a citation to [Abbeel et al., 2010]), but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) used for implementation or experimentation.
Experiment Setup Yes We set a POMDP model as follows by following a similar manner with the data generation of MDP models in [Ueno et al., 2012]. The latent state space is S = {1, 2, . . . , K} and the true K is 3. The action takes any real number : A = R, and sets the transition probability as follows: p(s|s , a ) = λk where s is the j-th nearest to s + a with PK j=1 λj = 1. The reward function is r(s, a) = s2 a2 + 10. We allow the policy to be heterogeneous : π(a|b, θ) = PK k=1 b(sk)N(a|θk,1, θ2 k,2), where N is a density function of Gaussian distribution. We set the synthetic data are generated from the policy function, and true parameter values are (θ1,1, θ1,2, θ2,1, θ2,2, θ3,1, θ3,2) = (0.5, 1.0, 1.5, 1.5, 1.0, 0.5). The data size is set to N T = 1500.