Pretraining Representations for Data-Efficient Reinforcement Learning
Authors: Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Charlin, R Devon Hjelm, Philip Bachman, Aaron C. Courville
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experimental Details", "Our evaluation metric for an agent on a game is human-normalized score (HNS)", "Table 2: HNS on Atari100k for SGI and baselines. |
| Researcher Affiliation | Collaboration | Max Schwarzer 1,2, Nitarshan Rajkumar 1,2, Michael Noukhovitch1,2, Ankesh Anand1,2 Laurent Charlin1,3,5, Devon Hjelm1,4, Philip Bachman4, Aaron Courville1,2,5 1Mila, 2Université de Montréal, 3HEC Montréal, 4Microsoft Research, 5CIFAR |
| Pseudocode | Yes | We provide an overview of these components in Figure 1 and describe them in greater detail below; we also provide detailed pseudocode in Appendix D. |
| Open Source Code | Yes | We provide code associated with this work at https://github.com/mila-iqia/SGI. |
| Open Datasets | Yes | To address the first challenge, we focus our experimentation on the Atari 100k benchmark introduced by Kaiser et al. (2019)... We opt to use the publicly-available DQN Replay dataset (Agarwal et al., 2020), which contains data from training for 50M steps... |
| Dataset Splits | No | The paper states: 'We calculate this per game by averaging scores over 100 evaluation trajectories at the end of training, and across 10 random seeds for training.' and refers to the 'Atari 100k benchmark' which limits interaction steps. While it describes evaluation procedures and pretraining dataset sizes, it does not explicitly provide information about a separate validation dataset split (e.g., percentages or counts for hyperparameter tuning) distinct from the training and test sets. |
| Hardware Specification | No | The paper mentions 'We would like to thank Mila and Compute Canada for computational resources.' but does not specify any particular hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'PyTorch' and 'NumPy' in its references, and states 'our implementation matches that of SPR and is based on its publicly-released code' and refers to 'Dopamine baselines'. However, it does not provide specific version numbers for these software components or any other libraries needed for reproducibility. |
| Experiment Setup | Yes | Full implementation and hyperparameter details are provided in Appendix C. We optimize our three representation learning objectives jointly during unsupervised pretraining... and lower the learning rates for the pretrained encoder and dynamics model by two orders of magnitude... |