High Confidence Generalization for Reinforcement Learning

Authors: James Kostas, Yash Chandak, Scott M Jordan, Georgios Theocharous, Philip Thomas

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 9. Experiments and Results In this section, we run the four algorithms defined by the bounds above on two sets of MDPs: generalization gridworld and dynamic arm simulator one (DAS1) (Blana et al., 2009).
Researcher Affiliation Collaboration James E. Kostas 1 Yash Chandak 1 Scott M. Jordan 1 Georgios Theocharous 2 Philip S. Thomas 1 1College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA 2Adobe Research.
Pseudocode Yes Algorithm 1 HCGA Template Input : Feasible set Θ, a set of MDPs Macc, user-defined threshold j, probability 1 δ, and high-confidence bounding function b. Output : θ Θ {NSF}
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes In this section, we run the four algorithms defined by the bounds above on two sets of MDPs: generalization gridworld and dynamic arm simulator one (DAS1) (Blana et al., 2009).
Dataset Splits Yes An HCGA partitions Macc into Mtrain and Msafety; Mtrain is used for training, and Msafety is used for a safety test. ... As a simple heuristic, we partition the data into two sets of equal size in all experiments.
Hardware Specification No The paper discusses the computational cost of experiments but does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used to run them.
Software Dependencies No The paper mentions that hyperparameters and experimental details are in supplementary material Section K.1, but does not provide specific software dependencies with version numbers in the main text.
Experiment Setup No The paper states that "All hyperparameters and experimental details are given in supplementary material Section K.1" but does not provide specific experimental setup details within the main text.