reproducibilityindex.ai

Planning to Explore via Self-Supervised World Models

Authors: Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate on challenging control tasks from high-dimensional image inputs. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards.
Researcher Affiliation	Collaboration	1University of Pennsylvania 2UC Berkeley 3Google Research, Brain Team 4University of Toronto 5Carnegie Mellon University 6Facebook AI Research.
Pseudocode	Yes	Algorithm 1 Planning to Explore via Latent Disagreement
Open Source Code	Yes	Videos and code: https://ramanans1.github.io/ plan2explore/
Open Datasets	Yes	Environment Details We use the DM Control Suite (Tassa et al., 2018), a standard benchmark for continuous control.
Dataset Splits	No	The paper describes exploration steps and adaptation phases (zero-shot, few-shot) but does not specify explicit train/validation/test dataset splits with percentages or counts for reproducibility.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (GPU models, CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Dreamer (Hafner et al., 2020)' as the base agent but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, or other libraries.
Experiment Setup	Yes	We use (Hafner et al., 2020) with the original hyperparameters unless speciﬁed otherwise to optimize both exploration and task policies of Plan2Explore. We found that additional capacity provided by increasing the hidden size of the GRU in the latent dynamics model to 400 and the deterministic and stochastic components of the latent space to 60 helped performance. For a fair comparison, we maintain this model size for Dreamer and other baselines. For latent disagreement, we use an ensemble of 5 one-step prediction models implemented as 2 hidden-layer MLP. Full details are in the appendix.