reproducibilityindex.ai

Multi-Environment Pretraining Enables Transfer to Action Limited Datasets

Authors: David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on Atari game-playing environments and show that with target environment data equivalent to only 12 minutes of gameplay, we can significantly improve game performance and generalization capability compared to other approaches.
Researcher Affiliation	Collaboration	1Mila 2Mc Gill University 3Google DeepMind 4University of California, Berkeley.
Pseudocode	Yes	Table 1: A summary of ALPT.
Open Source Code	Yes	We make the source code publicly available for our Maze experiment only at this time. The details can be found at: https://anonymous.4open.science/ r/alpt_maze-5927/README.md.
Open Datasets	Yes	As in Lee et al. (2022), we use the standard ofﬂine RL Atari datasets from RL Unplugged (Gulcehre et al., 2020).
Dataset Splits	No	The paper describes pretraining and finetuning phases, and mentions evaluating performance, but does not explicitly specify a validation dataset split used during training for purposes like hyperparameter tuning or early stopping.
Hardware Specification	No	The paper describes architecture details and training hyperparameters (e.g., 'transformer with 6 layers of 8 heads each and hidden size 512', 'batch size of 256'), but does not specify any particular hardware used for training or evaluation, such as CPU or GPU models.
Software Dependencies	No	The paper mentions using a 'transformer' architecture, 'GPT-2 transformer architecture', 'decision transformers (DT)', and references 'RL Unplugged (Gulcehre et al., 2020)', but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Our architecture and training protocol follow the multi-game Atari setting outlined in Lee et al. (2022). Speciﬁcally, we use a transformer with 6 layers of 8 heads each and hidden size 512. The rest of the architecture and training hyperparameters remain unchanged for experiments on Atari. For the Maze navigation experiments, we modify the original hyperparameters to use a batch size of 256 and a weight decay of 5 10 5. During pre-training, we train the DT and IDM for 1M frames. The details of all parameters can be found in Appendix B.