Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Active World Model Learning with Progress Curiosity
Authors: Kuno Kim, Megumi Sano, Julian De Freitas, Nick Haber, Daniel Yamins
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an AWML system driven by γ-Progress: a novel and scalable learning progress-based curiosity signal. We show that γ-Progress gives rise to an exploration policy that overcomes the white noise problem and achieves significantly higher AWML performance than state-of-the-art exploration strategies including Random Network Distillation (RND) (Burda et al., 2018b) and Model Disagreement (Pathak et al., 2019). |
| Researcher Affiliation | Academia | 1Department of Computer Science, Stanford University 2Department of Psychology, Harvard University 3Graduate School of Education, Stanford University 4Department of Psychology, Stanford University. |
| Pseudocode | Yes | Algorithm 1 AWML with γ-Progress |
| Open Source Code | No | The paper provides links to videos of the environment but does not explicitly state that the source code for the methodology is available or provide a link to a code repository. |
| Open Datasets | No | The paper states that it uses a "custom-built 3D virtual world environment" but does not provide any concrete access information (link, DOI, citation) to make this dataset publicly available. |
| Dataset Splits | No | The paper mentions "validation losses" and that "For details on each behavior-specific validation case and metric computation, we refer readers to Appendix F." However, it does not provide specific dataset split percentages or counts for validation in the main text. |
| Hardware Specification | No | The acknowledgements mention "hardware donation from the NVIDIA Corporation", but this is a general statement and does not specify the exact GPU models, CPU models, or other detailed hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using the DQN learning algorithm and describes network architectures (e.g., LSTM, MLP, fully-connected networks), but it does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | Our controller φ is a two-layer fully-connected network with 512 hidden units that takes as input xt 2:t and outputs estimated Q-values for 9 possible actions which rotate the curious agent at different velocities. φ is updated with the DQN (Mnih et al., 2013) learning algorithm using the cost: ... with γ = 0.9995 across all experiments. |