RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
Authors: Roberta Raileanu, Tim Rocktäschel
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on multiple challenging procedurally-generated tasks in Mini Grid, as well as on tasks with high-dimensional observations used in prior work. Our experiments demonstrate that this approach is more sample efficient than existing exploration methods, particularly for procedurally-generated Mini Grid environments. |
| Researcher Affiliation | Collaboration | Roberta Raileanu Facebook AI Research New York University raileanu@cs.nyu.edu Tim Rocktäschel Facebook AI Research University College London rockt@fb.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that the source code for the described methodology (RIDE) is publicly available or provide a direct link to it. |
| Open Datasets | Yes | We evaluate RIDE on procedurally-generated environments from Mini Grid, as well as on two existing singleton environments with high-dimensional observations used in prior work, and compare it against both standard RL and three commonly used intrinsic reward methods for exploration. For the sole purpose of comparing in a fair way to the curiosity-driven exploration work by Pathak et al. (2017), we ran a one-off experiment on their Mario (singleton) environment (Kauten, 2018). The last (singleton) environment we evaluate on is Viz Doom (Kempka et al., 2016). |
| Dataset Splits | No | The paper does not explicitly state specific training, validation, and test dataset splits with percentages or sample counts for the main experiments. It mentions 'training on a set of 4 colors and tested on a held-out set of 2 colors' for a specific experiment in Appendix A.8, which implies a split, but not a general train/validation/test split for reproducibility across all experiments. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch and RMSProp and references the IMPALA implementation, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We ran grid searches over the learning rate [0.0001, 0.0005, 0.001], batch size [8, 32] and unroll length [20, 40, 100, 200]. The best values for all models can be found in Table 2. The learning rate is linearly annealed to 0 in all experiments. Parameter Value Learning Rate 0.0001 Batch Size 32 Unroll Length 100 Discount 0.99 RMSProp Momentum 0.0 RMSProp \u03b5 0.01 Clip Gradient Norm \u03b5 40.0 |