Self-Supervised Exploration via Disagreement
Authors: Deepak Pathak, Dhiraj Gandhi, Abhinav Gupta
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. |
| Researcher Affiliation | Collaboration | Deepak Pathak * 1 Dhiraj Gandhi * 2 Abhinav Gupta 2 3 1UC Berkelely 2CMU 3Facebook AI Research. |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | Yes | Project videos and code are at https://pathak22.github.io/exploration-by-disagreement/. |
| Open Datasets | Yes | We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. Project videos and code are at https://pathak22.github.io/exploration-by-disagreement/. |
| Dataset Splits | Yes | Out of a total of 30 objects, we created a set of 20 objects for training and 10 objects for testing. |
| Hardware Specification | No | The paper does not provide specific details on the computational hardware (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions software like PPO, Mujoco, Unity ML-agent, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In particular, we use random feature space in all video games and navigation, classification features in MNIST and Image Net-pretrained Res Net-18 features in real world robot experiments. We use 5 models in the ensemble. |