Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design
Authors: Matthew T Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Greg Farquhar, Shimon Whiteson, Jakob Foerster
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a series of experiments, we show that GROOVE achieves superior generalization to LPG, and evaluate AR against baseline metrics from UED, identifying it as a critical component of environment design in this setting. (Abstract) and Our experiments are designed to determine (1) how the meta-training distribution impacts OOD generalization in PMO, (2) how well AR identifies informative levels for generalization, and (3) the effectiveness of GROOVE at generating curricula for generalization using this metric. (Section 4) |
| Researcher Affiliation | Collaboration | Matthew T. Jackson University of Oxford Minqi Jiang UCL Jack Parker-Holder Google Deep Mind Risto Vuorio University of Oxford Chris Lu University of Oxford Gregory Farquhar Google Deep Mind Shimon Whiteson University of Oxford Jakob N. Foerster University of Oxford |
| Pseudocode | Yes | The meta-training loop for GROOVE is presented in Algorithm 1 and Figure 2. (Section 3.3) and Algorithm 1 GROOVE meta-training |
| Open Source Code | Yes | As well as being the first complete and open-source implementation of LPG...the project repository is available at https://github.com/Empty Jackson/groove. |
| Open Datasets | Yes | For meta-training, we use a generalization of the tabular Grid-World environment presented by Oh et al. [2020]. (Section 4.1) and To approximate this, we evaluate on Atari [Bellemare et al., 2013], an archetypal RL benchmark, as well as its simplified counterpart Min-Atar [Young and Tian, 2019] for our intermediate results. (Section 4.1) |
| Dataset Splits | No | The paper refers to 'meta-training' and 'meta-testing' phases on different environments, and mentions hyperparameters were tuned using LPG on Grid-World, but it does not provide specific train/validation/test dataset splits with percentages or counts for reproducing the data partitioning of the primary experimental datasets (Grid-World, Atari, Min-Atar). |
| Hardware Specification | Yes | We implement GROOVE and LPG in JAX [Bradbury et al., 2018], resulting in a meta-training time of 3 hours on a single V100 GPU. (Introduction) and Our experiments were executed on two to five servers, containing eight GPUs each (ranging in performance from 1080-Ti to V100). (Section 4.1) |
| Software Dependencies | No | The paper states 'We implement GROOVE and LPG in JAX [Bradbury et al., 2018]' but does not provide specific version numbers for JAX or other software dependencies. |
| Experiment Setup | Yes | Model hyperparameters can be found in the supplementary materials and the project repository is available at https://github.com/Empty Jackson/groove. (Section 4.1) and Table 2: GROOVE/LPG hyperparameters and Table 3: Agent hyperparameters architecture descriptions (Appendix B). |