reproducibilityindex.ai

Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs

Authors: Marek Petrik, Reazul Hasan Russel

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically evaluate the safe estimates computed using Hoeffding, BCI, and RSVF ambiguity sets. We start by assuming a true model and generate simulated datasets from it.
Researcher Affiliation	Academia	Reazul Hasan Russel Department of Computer Science University of New Hampshire rrussel@cs.unh.edu Marek Petrik Department of Computer Science University of New Hampshire mpetrik@cs.unh.edu
Pseudocode	Yes	Algorithm 1: RSVF: Adapted Ambiguity Sets; Algorithm 2, in the appendix, summarizes the sort-based method.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for the methodology, nor does it include links to a code repository.
Open Datasets	Yes	We start by assuming a true model and generate simulated datasets from it. Each dataset is then used to construct an ambiguity set and a safe estimate of policy return. We ﬁrst use the standard River Swim domain for the evaluation [36].
Dataset Splits	No	The paper mentions generating simulated datasets and varying the number of samples, but it does not provide specific train/validation/test split percentages or sample counts to reproduce the data partitioning.
Hardware Specification	No	The paper states that computational complexity is not evaluated as it targets data-constrained problems, but it does not provide any specific hardware details (like CPU/GPU models or memory) used for running the experiments.
Software Dependencies	No	The paper mentions "MCMC sampling libraries like JAGS, Stan, or others [11]" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	The value function for the states s1, . . . , s5 is ﬁxed to be [1, 2, 3, 4, 5]. RSVF is run for a single iteration with the given value function. The ground truth is generated from the corresponding prior for each one of the problems. All Bayesian methods draw 1, 000 samples from the posterior for each state and action. As the prior distribution, we use the uniform Dirichlet distribution over all states.