Learning General World Models in a Handful of Reward-Free Deployments

Authors: Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip Ball, Oleh Rybkin, S Roberts, Tim Rocktäschel, Edward Grefenstette

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, Mini Grid, Crafter and the DM Control Suite.
Researcher Affiliation Collaboration Yingchen Xu Jack Parker-Holder University of Oxford Aldo Pacchiano Microsoft Research Philip J. Ball University of Oxford Oleh Rybkin Stephen J. Roberts University of Oxford Tim Rockt aschel UCL, Cohere Edward Grefenstette UCL, Cohere
Pseudocode Yes Algorithm 1 Reward-Free Deployment Efficiency via World Models
Open Source Code Yes Code and videos are available at https://ycxuyingchen.github.io/cascade/
Open Datasets Yes We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, Mini Grid, Crafter and the DM Control Suite.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits with specific percentages or counts. It describes dynamic data collection and zero-shot transfer, where models are trained on collected data and then evaluated on novel tasks/rewards, rather than predefined dataset splits.
Hardware Specification Yes All experiments were run on a single machine with a NVIDIA RTX 3090 GPU, a 2.9GHz Intel Xeon W-1290 processor, and 128 GB of RAM.
Software Dependencies Yes Our code is written in Python 3.8 using PyTorch 1.10. We use a number of publicly available libraries, including Gymnasium, Torchrl, RLiable, and Stable Baselines3. We have provided a requirements.txt file in our repository for full reproducibility.
Experiment Setup Yes All methods make use of a Dreamer V2 world model [40] and use the same hyperparameters for model and agent training (more details in App. B).