reproducibilityindex.ai

Reward-Free Curricula for Training Robust World Models

Authors: Marc Rigter, Minqi Jiang, Ingmar Posner

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that WAKER outperforms several baselines, resulting in improved robustness, efficiency, and generalisation. 4 EXPERIMENTS
Researcher Affiliation	Academia	Marc Rigter University of Oxford marcrigter@gmail.com Minqi Jiang University College London Ingmar Posner University of Oxford
Pseudocode	Yes	Algorithm 1 Weighted Acquisition of Knowledge across Environments for Robustness (WAKER)
Open Source Code	Yes	The code for our experiments is available at github.com/marcrigter/waker.
Open Datasets	Yes	For Terrain Walker and Terrain Hopper we simulate the Walker and Hopper robots from the DMControl Suite (Tassa et al., 2018) on procedurally generated terrain. ... The Clean Up and Car Clean Up domains are based on Safety Gym (Ray et al., 2019)...
Dataset Splits	No	The paper describes evaluation on 'randomly sampled environments' and 'out-of-distribution environments', but does not provide specific train/validation/test dataset splits with percentages or counts for reproducibility.
Hardware Specification	Yes	Each world model training run takes 6 days on an NVIDIA V100 GPU.
Software Dependencies	Yes	For the world model, we use the official open-source implementation of Dreamer V2 (Hafner et al., 2021) at https://github.com/danijar/dreamerv2. For the world model training we use the default hyperparameters from Dreamer V2...
Experiment Setup	Yes	For the world model training we use the default hyperparameters from Dreamer V2, with the default batch size of 16 trajectories with 50 steps each. In our experiments, we set p DR = 0.2 for all experiments and did not tune this value. We performed limited hyperparameter tuning of the Boltzmann temperature parameter, η.