Active World Model Learning with Progress Curiosity

Authors: Kuno Kim, Megumi Sano, Julian De Freitas, Nick Haber, Daniel Yamins

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose an AWML system driven by γ-Progress: a novel and scalable learning progress-based curiosity signal. We show that γ-Progress gives rise to an exploration policy that overcomes the white noise problem and achieves significantly higher AWML performance than state-of-the-art exploration strategies including Random Network Distillation (RND) (Burda et al., 2018b) and Model Disagreement (Pathak et al., 2019).
Researcher Affiliation Academia 1Department of Computer Science, Stanford University 2Department of Psychology, Harvard University 3Graduate School of Education, Stanford University 4Department of Psychology, Stanford University.
Pseudocode Yes Algorithm 1 AWML with γ-Progress
Open Source Code No The paper provides links to videos of the environment but does not explicitly state that the source code for the methodology is available or provide a link to a code repository.
Open Datasets No The paper states that it uses a "custom-built 3D virtual world environment" but does not provide any concrete access information (link, DOI, citation) to make this dataset publicly available.
Dataset Splits No The paper mentions "validation losses" and that "For details on each behavior-specific validation case and metric computation, we refer readers to Appendix F." However, it does not provide specific dataset split percentages or counts for validation in the main text.
Hardware Specification No The acknowledgements mention "hardware donation from the NVIDIA Corporation", but this is a general statement and does not specify the exact GPU models, CPU models, or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper mentions using the DQN learning algorithm and describes network architectures (e.g., LSTM, MLP, fully-connected networks), but it does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes Our controller φ is a two-layer fully-connected network with 512 hidden units that takes as input xt 2:t and outputs estimated Q-values for 9 possible actions which rotate the curious agent at different velocities. φ is updated with the DQN (Mnih et al., 2013) learning algorithm using the cost: ... with γ = 0.9995 across all experiments.