Learning General World Models in a Handful of Reward-Free Deployments
Authors: Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip Ball, Oleh Rybkin, S Roberts, Tim Rocktäschel, Edward Grefenstette
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, Mini Grid, Crafter and the DM Control Suite. |
| Researcher Affiliation | Collaboration | Yingchen Xu Jack Parker-Holder University of Oxford Aldo Pacchiano Microsoft Research Philip J. Ball University of Oxford Oleh Rybkin Stephen J. Roberts University of Oxford Tim Rockt aschel UCL, Cohere Edward Grefenstette UCL, Cohere |
| Pseudocode | Yes | Algorithm 1 Reward-Free Deployment Efficiency via World Models |
| Open Source Code | Yes | Code and videos are available at https://ycxuyingchen.github.io/cascade/ |
| Open Datasets | Yes | We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, Mini Grid, Crafter and the DM Control Suite. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits with specific percentages or counts. It describes dynamic data collection and zero-shot transfer, where models are trained on collected data and then evaluated on novel tasks/rewards, rather than predefined dataset splits. |
| Hardware Specification | Yes | All experiments were run on a single machine with a NVIDIA RTX 3090 GPU, a 2.9GHz Intel Xeon W-1290 processor, and 128 GB of RAM. |
| Software Dependencies | Yes | Our code is written in Python 3.8 using PyTorch 1.10. We use a number of publicly available libraries, including Gymnasium, Torchrl, RLiable, and Stable Baselines3. We have provided a requirements.txt file in our repository for full reproducibility. |
| Experiment Setup | Yes | All methods make use of a Dreamer V2 world model [40] and use the same hyperparameters for model and agent training (more details in App. B). |