Bayesian Policy Optimization for Model Uncertainty
Authors: Gilwoo Lee, Brian Hou, Aditya Mandalika, Jeongseok Lee, Sanjiban Choudhury, Siddhartha S. Srinivasa
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental Results We evaluate BPO on discrete and continuous POMDP benchmarks to highlight its use of information-gathering actions. We also evaluate BPO on BAMDP problems constructed by varying physical model parameters on Open AI benchmark problems (Brockman et al., 2016). |
| Researcher Affiliation | Academia | Paul G. Allen School of Computer Science & Engineering University of Washington {gilwoo,bhou,adityavk,jslee02,sanjibac,siddh}@cs.uw.edu |
| Pseudocode | Yes | Algorithm 1 Bayesian Policy Optimization |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for their method, nor does it provide any links to a repository. |
| Open Datasets | Yes | We evaluate BPO on discrete and continuous POMDP benchmarks to highlight its use of information-gathering actions. We also evaluate BPO on BAMDP problems constructed by varying physical model parameters on Open AI benchmark problems (Brockman et al., 2016). |
| Dataset Splits | No | The paper describes the reinforcement learning environments and training parameters but does not specify explicit training/validation/test dataset splits as percentages or sample counts in the way a supervised learning paper would. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using TRPO and an implementation provided by Duan et al. (2016) but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, specific library versions). |
| Experiment Setup | Yes | Appendix Table 1: Training parameters Max. episode length, Batch size, Training iterations, Discount (γ), Stepsize (DKL), GAE λ are specified for Tiger, Chain, Light Dark, and Mu Jo Co. |