Replay-Guided Adversarial Environment Design

Authors: Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments in Section 6 investigate the learning dynamics of PLR , REPAIRED, and their replay-free counterparts on a challenging maze domain and a novel continuous control UED setting based on the popular Car Racing environment [5]. In both of these highly distinct settings, our methods provide significant improvements over PLR and PAIRED, producing agents that can perform out-of-distribution (OOD) generalization to a variety of human designed mazes and Formula 1 tracks.
Researcher Affiliation Collaboration Minqi Jiang UCL, FAIR Michael Dennis UC Berkeley Jack Parker-Holder University of Oxford Jakob Foerster FAIR Edward Grefenstette UCL, FAIR Tim Rocktäschel UCL, FAIR
Pseudocode Yes Algorithm 1: Robust PLR (PLR )
Open Source Code Yes We open source our methods at https://github.com/facebookresearch/dcd.
Open Datasets No The paper refers to environments like 'maze domain' and 'Car Racing environment' and states these are 'based on' or 'extended versions' of existing environments like 'Open AI Gym', but it does not provide concrete access information (link, DOI, formal citation with authors/year) for specific datasets used for training.
Dataset Splits Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix D.1 and D.2.
Hardware Specification Yes All experiments were run on a single NVIDIA GeForce RTX 2080 Ti GPU.
Software Dependencies Yes All experiments were implemented in Python 3.7.6.
Experiment Setup Yes We provide environment descriptions alongside model and hyperparameter choices in Appendix D.