On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning
Authors: Yifan Xu, Nicklas Hansen, Zirui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method and a set of strong baselines extensively across 14 Atari 2600 games from the Atari100k benchmark (Kaiser et al., 2020) where only 100k environment steps are permitted. We provide an implementation of our method at https://nicklashansen.github.io/xtra. Table 1. Atari100k benchmark results (similar pretraining tasks). Methods are evaluated at 100k environment steps. |
| Researcher Affiliation | Academia | University of California, San Diego {yix081, nihansen, ziw029, ychan, has168, ztu}@ucsd.edu |
| Pseudocode | No | The paper describes the framework and methods in text and uses diagrams (Figure 2, Figure 3, Figure 4) and mathematical formulas, but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We provide an implementation of our method at https://nicklashansen.github.io/xtra. Code for reproducing our experiments is made available at https://nicklashansen.github.io/xtra. |
| Open Datasets | Yes | We base our experiments on the ALE due to cues that are easily identifiable to humans despite great diversity in tasks and identify two key ingredients cross-task finetuning and task alignment for model-based adaptation that improve sample-efficiency substantially compared to models learned tabula rasa. We evaluate our method and a set of strong baselines extensively across 14 Atari 2600 games from the Atari100k benchmark (Kaiser et al., 2020) where only 100k environment steps are permitted. To train the model in offline multi-task pretraining stage, we use trajectories collected by Efficient Zero (Ye et al., 2021) on the Atari100k benchmark. |
| Dataset Splits | No | The paper mentions 'evaluation episodes' and 'environment steps' which relate to the test phase, but does not specify explicit train/validation/test splits (e.g., percentages or sample counts) for the dataset used. |
| Hardware Specification | No | The paper discusses reinforcement learning experiments but does not provide any specific details about the hardware used (e.g., CPU, GPU, or TPU models, or cloud computing configurations). |
| Software Dependencies | No | The paper mentions using the Arcade Learning Environment (ALE) and building upon Efficient Zero, but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | We adopt our hyper-parameters from Efficient Zero (Ye et al., 2021) with minimal modification. Table 8. Hyper-parameters. We list all relevant hyper-parameters below. Values are adopted from Ye et al. (2021) with minimal modification but included here for completeness. Minibatch size (offline tasks) 256 Minibatch size (target task) 256 Optimizer SGD Optimizer: learning rate 0.2 Optimizer: momentum 0.9 Optimizer: weight decay (c) 0.0001 Training steps 100K/120K Evaluation episodes 32 Number of simulations in MCTS (Nsim) 50 |