reproducibilityindex.ai

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

Authors: Yifan Xu, Nicklas Hansen, Zirui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method and a set of strong baselines extensively across 14 Atari 2600 games from the Atari100k benchmark (Kaiser et al., 2020) where only 100k environment steps are permitted. We provide an implementation of our method at https://nicklashansen.github.io/xtra. Table 1. Atari100k benchmark results (similar pretraining tasks). Methods are evaluated at 100k environment steps.
Researcher Affiliation	Academia	University of California, San Diego {yix081, nihansen, ziw029, ychan, has168, ztu}@ucsd.edu
Pseudocode	No	The paper describes the framework and methods in text and uses diagrams (Figure 2, Figure 3, Figure 4) and mathematical formulas, but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	We provide an implementation of our method at https://nicklashansen.github.io/xtra. Code for reproducing our experiments is made available at https://nicklashansen.github.io/xtra.
Open Datasets	Yes	We base our experiments on the ALE due to cues that are easily identifiable to humans despite great diversity in tasks and identify two key ingredients cross-task finetuning and task alignment for model-based adaptation that improve sample-efficiency substantially compared to models learned tabula rasa. We evaluate our method and a set of strong baselines extensively across 14 Atari 2600 games from the Atari100k benchmark (Kaiser et al., 2020) where only 100k environment steps are permitted. To train the model in offline multi-task pretraining stage, we use trajectories collected by Efficient Zero (Ye et al., 2021) on the Atari100k benchmark.
Dataset Splits	No	The paper mentions 'evaluation episodes' and 'environment steps' which relate to the test phase, but does not specify explicit train/validation/test splits (e.g., percentages or sample counts) for the dataset used.
Hardware Specification	No	The paper discusses reinforcement learning experiments but does not provide any specific details about the hardware used (e.g., CPU, GPU, or TPU models, or cloud computing configurations).
Software Dependencies	No	The paper mentions using the Arcade Learning Environment (ALE) and building upon Efficient Zero, but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup	Yes	We adopt our hyper-parameters from Efficient Zero (Ye et al., 2021) with minimal modification. Table 8. Hyper-parameters. We list all relevant hyper-parameters below. Values are adopted from Ye et al. (2021) with minimal modification but included here for completeness. Minibatch size (offline tasks) 256 Minibatch size (target task) 256 Optimizer SGD Optimizer: learning rate 0.2 Optimizer: momentum 0.9 Optimizer: weight decay (c) 0.0001 Training steps 100K/120K Evaluation episodes 32 Number of simulations in MCTS (Nsim) 50