reproducibilityindex.ai

Factorized Asymptotic Bayesian Policy Search for POMDPs

Authors: Masaaki Imaizumi, Ryohei Fujimaki

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that FABPS outperforms state-of-the-art model selection methods for POMDPs, with respect both to model selection and to expected total rewards.Our experiments, on simulation and helicopter data, show that FABPS outperforms state-of-the-art POMDP model selection methods about both model selection and total rewards.
Researcher Affiliation	Collaboration	Masaaki Imaizumi Institute of Statistical Mathematics insou11@hotmail.com Ryohei Fujimaki NEC Corporation rfujimaki@nec-labs.com
Pseudocode	Yes	Algorithm 1 FABPS algorithm for selecting K
Open Source Code	No	The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available, in supplementary material, or will be released.
Open Datasets	Yes	We used data for helicopter control, provided by [Abbeel et al., 2010].
Dataset Splits	No	The paper mentions generating 'training data sequences' and using 'helicopter data' (with a citation to [Abbeel et al., 2010]), but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) used for implementation or experimentation.
Experiment Setup	Yes	We set a POMDP model as follows by following a similar manner with the data generation of MDP models in [Ueno et al., 2012]. The latent state space is S = {1, 2, . . . , K} and the true K is 3. The action takes any real number : A = R, and sets the transition probability as follows: p(s\|s , a ) = λk where s is the j-th nearest to s + a with PK j=1 λj = 1. The reward function is r(s, a) = s2 a2 + 10. We allow the policy to be heterogeneous : π(a\|b, θ) = PK k=1 b(sk)N(a\|θk,1, θ2 k,2), where N is a density function of Gaussian distribution. We set the synthetic data are generated from the policy function, and true parameter values are (θ1,1, θ1,2, θ2,1, θ2,2, θ3,1, θ3,2) = (0.5, 1.0, 1.5, 1.5, 1.0, 0.5). The data size is set to N T = 1500.