reproducibilityindex.ai

Active World Model Learning with Progress Curiosity

Authors: Kuno Kim, Megumi Sano, Julian De Freitas, Nick Haber, Daniel Yamins

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose an AWML system driven by γ-Progress: a novel and scalable learning progress-based curiosity signal. We show that γ-Progress gives rise to an exploration policy that overcomes the white noise problem and achieves significantly higher AWML performance than state-of-the-art exploration strategies including Random Network Distillation (RND) (Burda et al., 2018b) and Model Disagreement (Pathak et al., 2019).
Researcher Affiliation	Academia	1Department of Computer Science, Stanford University 2Department of Psychology, Harvard University 3Graduate School of Education, Stanford University 4Department of Psychology, Stanford University.
Pseudocode	Yes	Algorithm 1 AWML with γ-Progress
Open Source Code	No	The paper provides links to videos of the environment but does not explicitly state that the source code for the methodology is available or provide a link to a code repository.
Open Datasets	No	The paper states that it uses a "custom-built 3D virtual world environment" but does not provide any concrete access information (link, DOI, citation) to make this dataset publicly available.
Dataset Splits	No	The paper mentions "validation losses" and that "For details on each behavior-specific validation case and metric computation, we refer readers to Appendix F." However, it does not provide specific dataset split percentages or counts for validation in the main text.
Hardware Specification	No	The acknowledgements mention "hardware donation from the NVIDIA Corporation", but this is a general statement and does not specify the exact GPU models, CPU models, or other detailed hardware specifications used for running the experiments.
Software Dependencies	No	The paper mentions using the DQN learning algorithm and describes network architectures (e.g., LSTM, MLP, fully-connected networks), but it does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	Our controller φ is a two-layer fully-connected network with 512 hidden units that takes as input xt 2:t and outputs estimated Q-values for 9 possible actions which rotate the curious agent at different velocities. φ is updated with the DQN (Mnih et al., 2013) learning algorithm using the cost: ... with γ = 0.9995 across all experiments.