Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Active World Model Learning with Progress Curiosity

Authors: Kuno Kim, Megumi Sano, Julian De Freitas, Nick Haber, Daniel Yamins

ICML 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose an AWML system driven by γ-Progress: a novel and scalable learning progress-based curiosity signal. We show that γ-Progress gives rise to an exploration policy that overcomes the white noise problem and achieves significantly higher AWML performance than state-of-the-art exploration strategies including Random Network Distillation (RND) (Burda et al., 2018b) and Model Disagreement (Pathak et al., 2019).
Researcher Affiliation Academia 1Department of Computer Science, Stanford University 2Department of Psychology, Harvard University 3Graduate School of Education, Stanford University 4Department of Psychology, Stanford University.
Pseudocode Yes Algorithm 1 AWML with γ-Progress
Open Source Code No The paper provides links to videos of the environment but does not explicitly state that the source code for the methodology is available or provide a link to a code repository.
Open Datasets No The paper states that it uses a "custom-built 3D virtual world environment" but does not provide any concrete access information (link, DOI, citation) to make this dataset publicly available.
Dataset Splits No The paper mentions "validation losses" and that "For details on each behavior-specific validation case and metric computation, we refer readers to Appendix F." However, it does not provide specific dataset split percentages or counts for validation in the main text.
Hardware Specification No The acknowledgements mention "hardware donation from the NVIDIA Corporation", but this is a general statement and does not specify the exact GPU models, CPU models, or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper mentions using the DQN learning algorithm and describes network architectures (e.g., LSTM, MLP, fully-connected networks), but it does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes Our controller φ is a two-layer fully-connected network with 512 hidden units that takes as input xt 2:t and outputs estimated Q-values for 9 possible actions which rotate the curious agent at different velocities. φ is updated with the DQN (Mnih et al., 2013) learning algorithm using the cost: ... with γ = 0.9995 across all experiments.