reproducibilityindex.ai

Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design

Authors: Matthew T Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Greg Farquhar, Shimon Whiteson, Jakob Foerster

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In a series of experiments, we show that GROOVE achieves superior generalization to LPG, and evaluate AR against baseline metrics from UED, identifying it as a critical component of environment design in this setting. (Abstract) and Our experiments are designed to determine (1) how the meta-training distribution impacts OOD generalization in PMO, (2) how well AR identifies informative levels for generalization, and (3) the effectiveness of GROOVE at generating curricula for generalization using this metric. (Section 4)
Researcher Affiliation	Collaboration	Matthew T. Jackson University of Oxford Minqi Jiang UCL Jack Parker-Holder Google Deep Mind Risto Vuorio University of Oxford Chris Lu University of Oxford Gregory Farquhar Google Deep Mind Shimon Whiteson University of Oxford Jakob N. Foerster University of Oxford
Pseudocode	Yes	The meta-training loop for GROOVE is presented in Algorithm 1 and Figure 2. (Section 3.3) and Algorithm 1 GROOVE meta-training
Open Source Code	Yes	As well as being the first complete and open-source implementation of LPG...the project repository is available at https://github.com/Empty Jackson/groove.
Open Datasets	Yes	For meta-training, we use a generalization of the tabular Grid-World environment presented by Oh et al. [2020]. (Section 4.1) and To approximate this, we evaluate on Atari [Bellemare et al., 2013], an archetypal RL benchmark, as well as its simplified counterpart Min-Atar [Young and Tian, 2019] for our intermediate results. (Section 4.1)
Dataset Splits	No	The paper refers to 'meta-training' and 'meta-testing' phases on different environments, and mentions hyperparameters were tuned using LPG on Grid-World, but it does not provide specific train/validation/test dataset splits with percentages or counts for reproducing the data partitioning of the primary experimental datasets (Grid-World, Atari, Min-Atar).
Hardware Specification	Yes	We implement GROOVE and LPG in JAX [Bradbury et al., 2018], resulting in a meta-training time of 3 hours on a single V100 GPU. (Introduction) and Our experiments were executed on two to five servers, containing eight GPUs each (ranging in performance from 1080-Ti to V100). (Section 4.1)
Software Dependencies	No	The paper states 'We implement GROOVE and LPG in JAX [Bradbury et al., 2018]' but does not provide specific version numbers for JAX or other software dependencies.
Experiment Setup	Yes	Model hyperparameters can be found in the supplementary materials and the project repository is available at https://github.com/Empty Jackson/groove. (Section 4.1) and Table 2: GROOVE/LPG hyperparameters and Table 3: Agent hyperparameters architecture descriptions (Appendix B).