Curiosity-driven Exploration by Self-supervised Prediction

Authors: Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed approach is evaluated in two environments: Viz Doom and Super Mario Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where curiosity allows for far fewer interactions with the environment to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios (e.g. new levels of the same game) where the knowledge gained from earlier experience helps the agent explore new places much faster than starting from scratch.
Researcher Affiliation Academia Deepak Pathak 1 Pulkit Agrawal 1 Alexei A. Efros 1 Trevor Darrell 1 1University of California, Berkeley. Correspondence to: Deepak Pathak <pathak@berkeley.edu>.
Pseudocode No The paper describes the proposed method using textual descriptions and a system diagram (Figure 2) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No At the end of the paper, it states: 'To further aid in this effort, we will make the code for our algorithm, as well as testing and environment setups freely available online.' This indicates a future release, not concrete access at the time of publication.
Open Datasets Yes Our first environment is the Viz Doom (Kempka et al., 2016) game...Our testing setup in all the experiments is the Doom My Way Home-v0 environment which is available as part of Open AI Gym (Brockman et al., 2016). Our second environment is the classic Nintendo game Super Mario Bros with a reparamterized 14 dimensional action space following (Paquette, 2016).
Dataset Splits No The paper describes varying reward sparsity and spawning locations for experimental scenarios but does not provide specific percentages or counts for training, validation, and test dataset splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments, only mentioning general environments like Viz Doom and the use of A3C.
Software Dependencies No The paper mentions software components and environments such as 'Open AI Gym', 'Viz Doom', and 'A3C' along with citations, but it does not specify explicit version numbers for these software dependencies (e.g., 'Open AI Gym vX.Y.Z').
Experiment Setup No The paper defines scaling factors (η) and weights (β, λ) used in the loss function, and mentions maximum time steps for episodes, but it does not provide concrete hyperparameter values such as learning rates, batch sizes, or specific optimizer settings for the A3C agent.