Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
High Confidence Generalization for Reinforcement Learning
Authors: James Kostas, Yash Chandak, Scott M Jordan, Georgios Theocharous, Philip Thomas
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 9. Experiments and Results In this section, we run the four algorithms defined by the bounds above on two sets of MDPs: generalization gridworld and dynamic arm simulator one (DAS1) (Blana et al., 2009). |
| Researcher Affiliation | Collaboration | James E. Kostas 1 Yash Chandak 1 Scott M. Jordan 1 Georgios Theocharous 2 Philip S. Thomas 1 1College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA 2Adobe Research. |
| Pseudocode | Yes | Algorithm 1 HCGA Template Input : Feasible set Θ, a set of MDPs Macc, user-defined threshold j, probability 1 δ, and high-confidence bounding function b. Output : θ Θ {NSF} |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | In this section, we run the four algorithms defined by the bounds above on two sets of MDPs: generalization gridworld and dynamic arm simulator one (DAS1) (Blana et al., 2009). |
| Dataset Splits | Yes | An HCGA partitions Macc into Mtrain and Msafety; Mtrain is used for training, and Msafety is used for a safety test. ... As a simple heuristic, we partition the data into two sets of equal size in all experiments. |
| Hardware Specification | No | The paper discusses the computational cost of experiments but does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used to run them. |
| Software Dependencies | No | The paper mentions that hyperparameters and experimental details are in supplementary material Section K.1, but does not provide specific software dependencies with version numbers in the main text. |
| Experiment Setup | No | The paper states that "All hyperparameters and experimental details are given in supplementary material Section K.1" but does not provide specific experimental setup details within the main text. |