reproducibilityindex.ai

Learning General World Models in a Handful of Reward-Free Deployments

Authors: Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip Ball, Oleh Rybkin, S Roberts, Tim Rocktäschel, Edward Grefenstette

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, Mini Grid, Crafter and the DM Control Suite.
Researcher Affiliation	Collaboration	Yingchen Xu Jack Parker-Holder University of Oxford Aldo Pacchiano Microsoft Research Philip J. Ball University of Oxford Oleh Rybkin Stephen J. Roberts University of Oxford Tim Rockt aschel UCL, Cohere Edward Grefenstette UCL, Cohere
Pseudocode	Yes	Algorithm 1 Reward-Free Deployment Efﬁciency via World Models
Open Source Code	Yes	Code and videos are available at https://ycxuyingchen.github.io/cascade/
Open Datasets	Yes	We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, Mini Grid, Crafter and the DM Control Suite.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits with specific percentages or counts. It describes dynamic data collection and zero-shot transfer, where models are trained on collected data and then evaluated on novel tasks/rewards, rather than predefined dataset splits.
Hardware Specification	Yes	All experiments were run on a single machine with a NVIDIA RTX 3090 GPU, a 2.9GHz Intel Xeon W-1290 processor, and 128 GB of RAM.
Software Dependencies	Yes	Our code is written in Python 3.8 using PyTorch 1.10. We use a number of publicly available libraries, including Gymnasium, Torchrl, RLiable, and Stable Baselines3. We have provided a requirements.txt ﬁle in our repository for full reproducibility.
Experiment Setup	Yes	All methods make use of a Dreamer V2 world model [40] and use the same hyperparameters for model and agent training (more details in App. B).