DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design
Authors: Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V Albrecht
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We supplement our findings with an empirical evaluation of different sampling strategies in the Procgen benchmark (Cobbe et al., 2020), and observe a strong correlation between I(π; L) and the generalisation gap. ... We measure DRED s capabilities in a gridworld navigation task that was designed to highlight this trade-off. ... Our experiments seek to answer the following questions: 1) How important is it to remain grounded to the target CMDP when generating additional levels, instead of simply maximising level diversity? 2) Is DRED successful in grounding the training distribution to the target CMDP, and does it improve transfer to held-out levels and edge-cases? ... In Figures 6 and 16, we evaluate transfer to in-context Hardcore levels. |
| Researcher Affiliation | Collaboration | 1School of Informatics, University of Edinburgh 2Huawei. Correspondence to: Samuel Garcin <s.garcin@ed.ac.uk>. |
| Pseudocode | Yes | Algorithm 1 Data-regularised environment design |
| Open Source Code | Yes | Our code and experimental data are available at https://github.com/uoe-agents/dred. |
| Open Datasets | Yes | We open-source our code for specifying arbitrary CMDPs in Minigrid and generate their associated level sets (we describe the generation process in detail in Appendix D). We also provide a dataset of 1.5M procedurally generated minigrid base layouts to facilitate level set generation. |
| Dataset Splits | No | No explicit training/validation/test splits with percentages or counts for the main RL experiments are provided in the main text. Appendix E.3 mentions 'cross-validation for hyperparameter tuning' for the VAE pre-training, but not for the main agent training. |
| Hardware Specification | No | The paper vaguely mentions 'a single GPU and 10 CPUs' and 'GPU-equipped laptop' but does not specify exact GPU/CPU models or other hardware details. |
| Software Dependencies | No | The paper mentions algorithms (PPO, VAE, GCN, GIN) and refers to other papers for architectures and hyperparameters, but does not list specific software dependencies with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | We employ the same ResNet policy architecture and PPO hyperparameters (identical for all games) as (Cobbe et al., 2020), which we reference in Table 8. ... We use the recurrent PPO agent and hyperparameters employed in (Parker-Holder et al., 2022) for all our experiments. ... Weights are optimised using Adam and we employ the same hyperparameters in all experiments, reported in Table 8. |