Curiosity-driven Exploration by Self-supervised Prediction
Authors: Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed approach is evaluated in two environments: Viz Doom and Super Mario Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where curiosity allows for far fewer interactions with the environment to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios (e.g. new levels of the same game) where the knowledge gained from earlier experience helps the agent explore new places much faster than starting from scratch. |
| Researcher Affiliation | Academia | Deepak Pathak 1 Pulkit Agrawal 1 Alexei A. Efros 1 Trevor Darrell 1 1University of California, Berkeley. Correspondence to: Deepak Pathak <pathak@berkeley.edu>. |
| Pseudocode | No | The paper describes the proposed method using textual descriptions and a system diagram (Figure 2) but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | At the end of the paper, it states: 'To further aid in this effort, we will make the code for our algorithm, as well as testing and environment setups freely available online.' This indicates a future release, not concrete access at the time of publication. |
| Open Datasets | Yes | Our first environment is the Viz Doom (Kempka et al., 2016) game...Our testing setup in all the experiments is the Doom My Way Home-v0 environment which is available as part of Open AI Gym (Brockman et al., 2016). Our second environment is the classic Nintendo game Super Mario Bros with a reparamterized 14 dimensional action space following (Paquette, 2016). |
| Dataset Splits | No | The paper describes varying reward sparsity and spawning locations for experimental scenarios but does not provide specific percentages or counts for training, validation, and test dataset splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments, only mentioning general environments like Viz Doom and the use of A3C. |
| Software Dependencies | No | The paper mentions software components and environments such as 'Open AI Gym', 'Viz Doom', and 'A3C' along with citations, but it does not specify explicit version numbers for these software dependencies (e.g., 'Open AI Gym vX.Y.Z'). |
| Experiment Setup | No | The paper defines scaling factors (η) and weights (β, λ) used in the loss function, and mentions maximum time steps for episodes, but it does not provide concrete hyperparameter values such as learning rates, batch sizes, or specific optimizer settings for the A3C agent. |