Replay-Guided Adversarial Environment Design
Authors: Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments in Section 6 investigate the learning dynamics of PLR , REPAIRED, and their replay-free counterparts on a challenging maze domain and a novel continuous control UED setting based on the popular Car Racing environment [5]. In both of these highly distinct settings, our methods provide significant improvements over PLR and PAIRED, producing agents that can perform out-of-distribution (OOD) generalization to a variety of human designed mazes and Formula 1 tracks. |
| Researcher Affiliation | Collaboration | Minqi Jiang UCL, FAIR Michael Dennis UC Berkeley Jack Parker-Holder University of Oxford Jakob Foerster FAIR Edward Grefenstette UCL, FAIR Tim Rocktäschel UCL, FAIR |
| Pseudocode | Yes | Algorithm 1: Robust PLR (PLR ) |
| Open Source Code | Yes | We open source our methods at https://github.com/facebookresearch/dcd. |
| Open Datasets | No | The paper refers to environments like 'maze domain' and 'Car Racing environment' and states these are 'based on' or 'extended versions' of existing environments like 'Open AI Gym', but it does not provide concrete access information (link, DOI, formal citation with authors/year) for specific datasets used for training. |
| Dataset Splits | Yes | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix D.1 and D.2. |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA GeForce RTX 2080 Ti GPU. |
| Software Dependencies | Yes | All experiments were implemented in Python 3.7.6. |
| Experiment Setup | Yes | We provide environment descriptions alongside model and hyperparameter choices in Appendix D. |