Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs

Authors: Marek Petrik, Reazul Hasan Russel

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically evaluate the safe estimates computed using Hoeffding, BCI, and RSVF ambiguity sets. We start by assuming a true model and generate simulated datasets from it.
Researcher Affiliation Academia Reazul Hasan Russel Department of Computer Science University of New Hampshire rrussel@cs.unh.edu Marek Petrik Department of Computer Science University of New Hampshire mpetrik@cs.unh.edu
Pseudocode Yes Algorithm 1: RSVF: Adapted Ambiguity Sets; Algorithm 2, in the appendix, summarizes the sort-based method.
Open Source Code No The paper does not provide any explicit statements about releasing source code for the methodology, nor does it include links to a code repository.
Open Datasets Yes We start by assuming a true model and generate simulated datasets from it. Each dataset is then used to construct an ambiguity set and a safe estimate of policy return. We first use the standard River Swim domain for the evaluation [36].
Dataset Splits No The paper mentions generating simulated datasets and varying the number of samples, but it does not provide specific train/validation/test split percentages or sample counts to reproduce the data partitioning.
Hardware Specification No The paper states that computational complexity is not evaluated as it targets data-constrained problems, but it does not provide any specific hardware details (like CPU/GPU models or memory) used for running the experiments.
Software Dependencies No The paper mentions "MCMC sampling libraries like JAGS, Stan, or others [11]" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes The value function for the states s1, . . . , s5 is fixed to be [1, 2, 3, 4, 5]. RSVF is run for a single iteration with the given value function. The ground truth is generated from the corresponding prior for each one of the problems. All Bayesian methods draw 1, 000 samples from the posterior for each state and action. As the prior distribution, we use the uniform Dirichlet distribution over all states.