reproducibilityindex.ai

Bayesian Policy Optimization for Model Uncertainty

Authors: Gilwoo Lee, Brian Hou, Aditya Mandalika, Jeongseok Lee, Sanjiban Choudhury, Siddhartha S. Srinivasa

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experimental Results We evaluate BPO on discrete and continuous POMDP benchmarks to highlight its use of information-gathering actions. We also evaluate BPO on BAMDP problems constructed by varying physical model parameters on Open AI benchmark problems (Brockman et al., 2016).
Researcher Affiliation	Academia	Paul G. Allen School of Computer Science & Engineering University of Washington {gilwoo,bhou,adityavk,jslee02,sanjibac,siddh}@cs.uw.edu
Pseudocode	Yes	Algorithm 1 Bayesian Policy Optimization
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for their method, nor does it provide any links to a repository.
Open Datasets	Yes	We evaluate BPO on discrete and continuous POMDP benchmarks to highlight its use of information-gathering actions. We also evaluate BPO on BAMDP problems constructed by varying physical model parameters on Open AI benchmark problems (Brockman et al., 2016).
Dataset Splits	No	The paper describes the reinforcement learning environments and training parameters but does not specify explicit training/validation/test dataset splits as percentages or sample counts in the way a supervised learning paper would.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using TRPO and an implementation provided by Duan et al. (2016) but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, specific library versions).
Experiment Setup	Yes	Appendix Table 1: Training parameters Max. episode length, Batch size, Training iterations, Discount (γ), Stepsize (DKL), GAE λ are specified for Tiger, Chain, Light Dark, and Mu Jo Co.