RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

Authors: Roberta Raileanu, Tim Rocktäschel

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on multiple challenging procedurally-generated tasks in Mini Grid, as well as on tasks with high-dimensional observations used in prior work. Our experiments demonstrate that this approach is more sample efficient than existing exploration methods, particularly for procedurally-generated Mini Grid environments.
Researcher Affiliation Collaboration Roberta Raileanu Facebook AI Research New York University raileanu@cs.nyu.edu Tim Rocktäschel Facebook AI Research University College London rockt@fb.com
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that the source code for the described methodology (RIDE) is publicly available or provide a direct link to it.
Open Datasets Yes We evaluate RIDE on procedurally-generated environments from Mini Grid, as well as on two existing singleton environments with high-dimensional observations used in prior work, and compare it against both standard RL and three commonly used intrinsic reward methods for exploration. For the sole purpose of comparing in a fair way to the curiosity-driven exploration work by Pathak et al. (2017), we ran a one-off experiment on their Mario (singleton) environment (Kauten, 2018). The last (singleton) environment we evaluate on is Viz Doom (Kempka et al., 2016).
Dataset Splits No The paper does not explicitly state specific training, validation, and test dataset splits with percentages or sample counts for the main experiments. It mentions 'training on a set of 4 colors and tested on a held-out set of 2 colors' for a specific experiment in Appendix A.8, which implies a split, but not a general train/validation/test split for reproducibility across all experiments.
Hardware Specification No The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software like PyTorch and RMSProp and references the IMPALA implementation, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We ran grid searches over the learning rate [0.0001, 0.0005, 0.001], batch size [8, 32] and unroll length [20, 40, 100, 200]. The best values for all models can be found in Table 2. The learning rate is linearly annealed to 0 in all experiments. Parameter Value Learning Rate 0.0001 Batch Size 32 Unroll Length 100 Discount 0.99 RMSProp Momentum 0.0 RMSProp \u03b5 0.01 Clip Gradient Norm \u03b5 40.0