reproducibilityindex.ai

Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL

Authors: Bogdan Mazoure, Ahmed M Ahmed, R Devon Hjelm, Andrey Kolobov, Patrick MacAlpine

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments1 ablate various components of CTRL and demonstrate that in combination with PPO it achieves better generalization performance on the challenging Procgen benchmark suite (Cobbe et al., 2020).
Researcher Affiliation	Collaboration	Bogdan Mazoure bogdan.mazoure@mail.mcgill.ca Mc Gill University, Quebec AI Institute Ahmed M. Ahmed ahmedah@stanford.edu Stanford University Patrick Mac Alpine patrick.macalpine@sony.com Sony AI R Devon Hjelm devon.hjelm@microsoft.com Université de Montréal, Quebec AI Institute, Microsoft Research Andrey Kolobov akolobov@microsoft.com Microsoft Research
Pseudocode	Yes	CTRL s pseudocode presented in Algorithm 1 in Appendix 8.1.
Open Source Code	Yes	1Code link: https://github.com/bmazoure/ctrl_public
Open Datasets	Yes	We compare CTRL against strong RL baselines: DAAC (Raileanu and Fergus, 2021) the current state-of-the-art on the challenging generalization benchmark suite Procgen (Cobbe et al., 2020)
Dataset Splits	No	The paper mentions training on N=200 levels and evaluating on tasks not seen during training (d(T \ TN)), which implies a train/test split, but does not explicitly describe a separate validation split or its proportion/methodology.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions software like 'IMPALA architecture', 'PPO', and 'Adam' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Table 2: Experiments parameters