Pretraining Representations for Data-Efficient Reinforcement Learning

Authors: Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Charlin, R Devon Hjelm, Philip Bachman, Aaron C. Courville

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experimental Details", "Our evaluation metric for an agent on a game is human-normalized score (HNS)", "Table 2: HNS on Atari100k for SGI and baselines.
Researcher Affiliation Collaboration Max Schwarzer 1,2, Nitarshan Rajkumar 1,2, Michael Noukhovitch1,2, Ankesh Anand1,2 Laurent Charlin1,3,5, Devon Hjelm1,4, Philip Bachman4, Aaron Courville1,2,5 1Mila, 2Université de Montréal, 3HEC Montréal, 4Microsoft Research, 5CIFAR
Pseudocode Yes We provide an overview of these components in Figure 1 and describe them in greater detail below; we also provide detailed pseudocode in Appendix D.
Open Source Code Yes We provide code associated with this work at https://github.com/mila-iqia/SGI.
Open Datasets Yes To address the first challenge, we focus our experimentation on the Atari 100k benchmark introduced by Kaiser et al. (2019)... We opt to use the publicly-available DQN Replay dataset (Agarwal et al., 2020), which contains data from training for 50M steps...
Dataset Splits No The paper states: 'We calculate this per game by averaging scores over 100 evaluation trajectories at the end of training, and across 10 random seeds for training.' and refers to the 'Atari 100k benchmark' which limits interaction steps. While it describes evaluation procedures and pretraining dataset sizes, it does not explicitly provide information about a separate validation dataset split (e.g., percentages or counts for hyperparameter tuning) distinct from the training and test sets.
Hardware Specification No The paper mentions 'We would like to thank Mila and Compute Canada for computational resources.' but does not specify any particular hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using 'PyTorch' and 'NumPy' in its references, and states 'our implementation matches that of SPR and is based on its publicly-released code' and refers to 'Dopamine baselines'. However, it does not provide specific version numbers for these software components or any other libraries needed for reproducibility.
Experiment Setup Yes Full implementation and hyperparameter details are provided in Appendix C. We optimize our three representation learning objectives jointly during unsupervised pretraining... and lower the learning rates for the pretrained encoder and dynamics model by two orders of magnitude...