Go Beyond Imagination: Maximizing Episodic Reachability with World Models
Authors: Yao Fu, Run Peng, Honglak Lee
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate Go BI in two domains: 2D procedurally-generated Minigrid environments (Chevalier Boisvert et al., 2018a) with hard-exploration tasks and locomotion tasks from Deep Mind Control Suite (Tunyasuvunakool et al., 2020). The experiments are designed to answer the following research questions: (1) How does Go BI perform against previous state-of-the-art intrinsic reward designs in terms of training-time sample efficiency on challenging procedurally-generated environments? (2) Can Go BI successfully extend to complex continuous domains with high-dimensional observations, for example control tasks with visual observations? (3) How does each component of our intrinsic reward contribute to the performance? (4) What is the influence of the accuracy of the learned world models to our method? |
| Researcher Affiliation | Collaboration | 1University of Michigan 2LG AI. |
| Pseudocode | Yes | We summarize our method in Algorithm 1 and illustrate the training process on Minigrid navigation tasks in Figure 1. |
| Open Source Code | No | The paper mentions building on official codebases (e.g., Novel D, RE3) but does not provide a statement or link for the open-sourcing of *their own* described methodology's code. |
| Open Datasets | Yes | 2D procedurally-generated Minigrid environments (Chevalier Boisvert et al., 2018a) with hard-exploration tasks and locomotion tasks from Deep Mind Control Suite (Tunyasuvunakool et al., 2020). |
| Dataset Splits | No | The paper references |
| Hardware Specification | Yes | At the same time, all the experiments are run with the same compute resource with Nvidia TITAN X GPU and 40 CPUs. |
| Software Dependencies | No | The paper mentions base algorithms like IMPALA and RAD, and specific libraries or toolkits like OpenAI Gym and DeepMind Control Suite, but does not provide specific version numbers for software dependencies (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | Table 1 shows the values of hyper-parameters shared across different methods. Table 2. The hyper-parameters of Go BI for experiments on Minigrid. Table 3 shows the values of hyper-parameters shared across different methods. Table 4. Hyper-parameters for experiments on Deep Mind Control Suites. |