Multi-Environment Pretraining Enables Transfer to Action Limited Datasets

Authors: David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on Atari game-playing environments and show that with target environment data equivalent to only 12 minutes of gameplay, we can significantly improve game performance and generalization capability compared to other approaches.
Researcher Affiliation Collaboration 1Mila 2Mc Gill University 3Google DeepMind 4University of California, Berkeley.
Pseudocode Yes Table 1: A summary of ALPT.
Open Source Code Yes We make the source code publicly available for our Maze experiment only at this time. The details can be found at: https://anonymous.4open.science/ r/alpt_maze-5927/README.md.
Open Datasets Yes As in Lee et al. (2022), we use the standard offline RL Atari datasets from RL Unplugged (Gulcehre et al., 2020).
Dataset Splits No The paper describes pretraining and finetuning phases, and mentions evaluating performance, but does not explicitly specify a validation dataset split used during training for purposes like hyperparameter tuning or early stopping.
Hardware Specification No The paper describes architecture details and training hyperparameters (e.g., 'transformer with 6 layers of 8 heads each and hidden size 512', 'batch size of 256'), but does not specify any particular hardware used for training or evaluation, such as CPU or GPU models.
Software Dependencies No The paper mentions using a 'transformer' architecture, 'GPT-2 transformer architecture', 'decision transformers (DT)', and references 'RL Unplugged (Gulcehre et al., 2020)', but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Our architecture and training protocol follow the multi-game Atari setting outlined in Lee et al. (2022). Specifically, we use a transformer with 6 layers of 8 heads each and hidden size 512. The rest of the architecture and training hyperparameters remain unchanged for experiments on Atari. For the Maze navigation experiments, we modify the original hyperparameters to use a batch size of 256 and a weight decay of 5 10 5. During pre-training, we train the DT and IDM for 1M frames. The details of all parameters can be found in Appendix B.