Bayesian Policy Optimization for Model Uncertainty

Authors: Gilwoo Lee, Brian Hou, Aditya Mandalika, Jeongseok Lee, Sanjiban Choudhury, Siddhartha S. Srinivasa

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experimental Results We evaluate BPO on discrete and continuous POMDP benchmarks to highlight its use of information-gathering actions. We also evaluate BPO on BAMDP problems constructed by varying physical model parameters on Open AI benchmark problems (Brockman et al., 2016).
Researcher Affiliation Academia Paul G. Allen School of Computer Science & Engineering University of Washington {gilwoo,bhou,adityavk,jslee02,sanjibac,siddh}@cs.uw.edu
Pseudocode Yes Algorithm 1 Bayesian Policy Optimization
Open Source Code No The paper does not contain any explicit statements about releasing source code for their method, nor does it provide any links to a repository.
Open Datasets Yes We evaluate BPO on discrete and continuous POMDP benchmarks to highlight its use of information-gathering actions. We also evaluate BPO on BAMDP problems constructed by varying physical model parameters on Open AI benchmark problems (Brockman et al., 2016).
Dataset Splits No The paper describes the reinforcement learning environments and training parameters but does not specify explicit training/validation/test dataset splits as percentages or sample counts in the way a supervised learning paper would.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using TRPO and an implementation provided by Duan et al. (2016) but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, specific library versions).
Experiment Setup Yes Appendix Table 1: Training parameters Max. episode length, Batch size, Training iterations, Discount (γ), Stepsize (DKL), GAE λ are specified for Tiger, Chain, Light Dark, and Mu Jo Co.