Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL

Authors: Bogdan Mazoure, Ahmed M Ahmed, R Devon Hjelm, Andrey Kolobov, Patrick MacAlpine

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments1 ablate various components of CTRL and demonstrate that in combination with PPO it achieves better generalization performance on the challenging Procgen benchmark suite (Cobbe et al., 2020).
Researcher Affiliation Collaboration Bogdan Mazoure bogdan.mazoure@mail.mcgill.ca Mc Gill University, Quebec AI Institute Ahmed M. Ahmed ahmedah@stanford.edu Stanford University Patrick Mac Alpine patrick.macalpine@sony.com Sony AI R Devon Hjelm devon.hjelm@microsoft.com Université de Montréal, Quebec AI Institute, Microsoft Research Andrey Kolobov akolobov@microsoft.com Microsoft Research
Pseudocode Yes CTRL s pseudocode presented in Algorithm 1 in Appendix 8.1.
Open Source Code Yes 1Code link: https://github.com/bmazoure/ctrl_public
Open Datasets Yes We compare CTRL against strong RL baselines: DAAC (Raileanu and Fergus, 2021) the current state-of-the-art on the challenging generalization benchmark suite Procgen (Cobbe et al., 2020)
Dataset Splits No The paper mentions training on N=200 levels and evaluating on tasks not seen during training (d(T \ TN)), which implies a train/test split, but does not explicitly describe a separate validation split or its proportion/methodology.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software like 'IMPALA architecture', 'PPO', and 'Adam' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Table 2: Experiments parameters