Self-Supervised Exploration via Disagreement

Authors: Deepak Pathak, Dhiraj Gandhi, Abhinav Gupta

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch.
Researcher Affiliation Collaboration Deepak Pathak * 1 Dhiraj Gandhi * 2 Abhinav Gupta 2 3 1UC Berkelely 2CMU 3Facebook AI Research.
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code Yes Project videos and code are at https://pathak22.github.io/exploration-by-disagreement/.
Open Datasets Yes We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. Project videos and code are at https://pathak22.github.io/exploration-by-disagreement/.
Dataset Splits Yes Out of a total of 30 objects, we created a set of 20 objects for training and 10 objects for testing.
Hardware Specification No The paper does not provide specific details on the computational hardware (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper mentions software like PPO, Mujoco, Unity ML-agent, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes In particular, we use random feature space in all video games and navigation, classification features in MNIST and Image Net-pretrained Res Net-18 features in real world robot experiments. We use 5 models in the ensemble.