Reward-Free Curricula for Training Robust World Models
Authors: Marc Rigter, Minqi Jiang, Ingmar Posner
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that WAKER outperforms several baselines, resulting in improved robustness, efficiency, and generalisation. 4 EXPERIMENTS |
| Researcher Affiliation | Academia | Marc Rigter University of Oxford marcrigter@gmail.com Minqi Jiang University College London Ingmar Posner University of Oxford |
| Pseudocode | Yes | Algorithm 1 Weighted Acquisition of Knowledge across Environments for Robustness (WAKER) |
| Open Source Code | Yes | The code for our experiments is available at github.com/marcrigter/waker. |
| Open Datasets | Yes | For Terrain Walker and Terrain Hopper we simulate the Walker and Hopper robots from the DMControl Suite (Tassa et al., 2018) on procedurally generated terrain. ... The Clean Up and Car Clean Up domains are based on Safety Gym (Ray et al., 2019)... |
| Dataset Splits | No | The paper describes evaluation on 'randomly sampled environments' and 'out-of-distribution environments', but does not provide specific train/validation/test dataset splits with percentages or counts for reproducibility. |
| Hardware Specification | Yes | Each world model training run takes 6 days on an NVIDIA V100 GPU. |
| Software Dependencies | Yes | For the world model, we use the official open-source implementation of Dreamer V2 (Hafner et al., 2021) at https://github.com/danijar/dreamerv2. For the world model training we use the default hyperparameters from Dreamer V2... |
| Experiment Setup | Yes | For the world model training we use the default hyperparameters from Dreamer V2, with the default batch size of 16 trajectories with 50 steps each. In our experiments, we set p DR = 0.2 for all experiments and did not tune this value. We performed limited hyperparameter tuning of the Boltzmann temperature parameter, η. |