Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL
Authors: Bogdan Mazoure, Ahmed M Ahmed, R Devon Hjelm, Andrey Kolobov, Patrick MacAlpine
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments1 ablate various components of CTRL and demonstrate that in combination with PPO it achieves better generalization performance on the challenging Procgen benchmark suite (Cobbe et al., 2020). |
| Researcher Affiliation | Collaboration | Bogdan Mazoure bogdan.mazoure@mail.mcgill.ca Mc Gill University, Quebec AI Institute Ahmed M. Ahmed ahmedah@stanford.edu Stanford University Patrick Mac Alpine patrick.macalpine@sony.com Sony AI R Devon Hjelm devon.hjelm@microsoft.com Université de Montréal, Quebec AI Institute, Microsoft Research Andrey Kolobov akolobov@microsoft.com Microsoft Research |
| Pseudocode | Yes | CTRL s pseudocode presented in Algorithm 1 in Appendix 8.1. |
| Open Source Code | Yes | 1Code link: https://github.com/bmazoure/ctrl_public |
| Open Datasets | Yes | We compare CTRL against strong RL baselines: DAAC (Raileanu and Fergus, 2021) the current state-of-the-art on the challenging generalization benchmark suite Procgen (Cobbe et al., 2020) |
| Dataset Splits | No | The paper mentions training on N=200 levels and evaluating on tasks not seen during training (d(T \ TN)), which implies a train/test split, but does not explicitly describe a separate validation split or its proportion/methodology. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software like 'IMPALA architecture', 'PPO', and 'Adam' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Table 2: Experiments parameters |