High Confidence Generalization for Reinforcement Learning
Authors: James Kostas, Yash Chandak, Scott M Jordan, Georgios Theocharous, Philip Thomas
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 9. Experiments and Results In this section, we run the four algorithms defined by the bounds above on two sets of MDPs: generalization gridworld and dynamic arm simulator one (DAS1) (Blana et al., 2009). |
| Researcher Affiliation | Collaboration | James E. Kostas 1 Yash Chandak 1 Scott M. Jordan 1 Georgios Theocharous 2 Philip S. Thomas 1 1College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA 2Adobe Research. |
| Pseudocode | Yes | Algorithm 1 HCGA Template Input : Feasible set Θ, a set of MDPs Macc, user-defined threshold j, probability 1 δ, and high-confidence bounding function b. Output : θ Θ {NSF} |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | In this section, we run the four algorithms defined by the bounds above on two sets of MDPs: generalization gridworld and dynamic arm simulator one (DAS1) (Blana et al., 2009). |
| Dataset Splits | Yes | An HCGA partitions Macc into Mtrain and Msafety; Mtrain is used for training, and Msafety is used for a safety test. ... As a simple heuristic, we partition the data into two sets of equal size in all experiments. |
| Hardware Specification | No | The paper discusses the computational cost of experiments but does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used to run them. |
| Software Dependencies | No | The paper mentions that hyperparameters and experimental details are in supplementary material Section K.1, but does not provide specific software dependencies with version numbers in the main text. |
| Experiment Setup | No | The paper states that "All hyperparameters and experimental details are given in supplementary material Section K.1" but does not provide specific experimental setup details within the main text. |