DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design

Authors: Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V Albrecht

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We supplement our findings with an empirical evaluation of different sampling strategies in the Procgen benchmark (Cobbe et al., 2020), and observe a strong correlation between I(π; L) and the generalisation gap. ... We measure DRED s capabilities in a gridworld navigation task that was designed to highlight this trade-off. ... Our experiments seek to answer the following questions: 1) How important is it to remain grounded to the target CMDP when generating additional levels, instead of simply maximising level diversity? 2) Is DRED successful in grounding the training distribution to the target CMDP, and does it improve transfer to held-out levels and edge-cases? ... In Figures 6 and 16, we evaluate transfer to in-context Hardcore levels.
Researcher Affiliation Collaboration 1School of Informatics, University of Edinburgh 2Huawei. Correspondence to: Samuel Garcin <s.garcin@ed.ac.uk>.
Pseudocode Yes Algorithm 1 Data-regularised environment design
Open Source Code Yes Our code and experimental data are available at https://github.com/uoe-agents/dred.
Open Datasets Yes We open-source our code for specifying arbitrary CMDPs in Minigrid and generate their associated level sets (we describe the generation process in detail in Appendix D). We also provide a dataset of 1.5M procedurally generated minigrid base layouts to facilitate level set generation.
Dataset Splits No No explicit training/validation/test splits with percentages or counts for the main RL experiments are provided in the main text. Appendix E.3 mentions 'cross-validation for hyperparameter tuning' for the VAE pre-training, but not for the main agent training.
Hardware Specification No The paper vaguely mentions 'a single GPU and 10 CPUs' and 'GPU-equipped laptop' but does not specify exact GPU/CPU models or other hardware details.
Software Dependencies No The paper mentions algorithms (PPO, VAE, GCN, GIN) and refers to other papers for architectures and hyperparameters, but does not list specific software dependencies with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes We employ the same ResNet policy architecture and PPO hyperparameters (identical for all games) as (Cobbe et al., 2020), which we reference in Table 8. ... We use the recurrent PPO agent and hyperparameters employed in (Parker-Holder et al., 2022) for all our experiments. ... Weights are optimised using Adam and we employ the same hyperparameters in all experiments, reported in Table 8.