Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs
Authors: Marek Petrik, Reazul Hasan Russel
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically evaluate the safe estimates computed using Hoeffding, BCI, and RSVF ambiguity sets. We start by assuming a true model and generate simulated datasets from it. |
| Researcher Affiliation | Academia | Reazul Hasan Russel Department of Computer Science University of New Hampshire rrussel@cs.unh.edu Marek Petrik Department of Computer Science University of New Hampshire mpetrik@cs.unh.edu |
| Pseudocode | Yes | Algorithm 1: RSVF: Adapted Ambiguity Sets; Algorithm 2, in the appendix, summarizes the sort-based method. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the methodology, nor does it include links to a code repository. |
| Open Datasets | Yes | We start by assuming a true model and generate simulated datasets from it. Each dataset is then used to construct an ambiguity set and a safe estimate of policy return. We first use the standard River Swim domain for the evaluation [36]. |
| Dataset Splits | No | The paper mentions generating simulated datasets and varying the number of samples, but it does not provide specific train/validation/test split percentages or sample counts to reproduce the data partitioning. |
| Hardware Specification | No | The paper states that computational complexity is not evaluated as it targets data-constrained problems, but it does not provide any specific hardware details (like CPU/GPU models or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "MCMC sampling libraries like JAGS, Stan, or others [11]" but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The value function for the states s1, . . . , s5 is fixed to be [1, 2, 3, 4, 5]. RSVF is run for a single iteration with the given value function. The ground truth is generated from the corresponding prior for each one of the problems. All Bayesian methods draw 1, 000 samples from the posterior for each state and action. As the prior distribution, we use the uniform Dirichlet distribution over all states. |