Active World Model Learning with Progress Curiosity
Authors: Kuno Kim, Megumi Sano, Julian De Freitas, Nick Haber, Daniel Yamins
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an AWML system driven by γ-Progress: a novel and scalable learning progress-based curiosity signal. We show that γ-Progress gives rise to an exploration policy that overcomes the white noise problem and achieves significantly higher AWML performance than state-of-the-art exploration strategies including Random Network Distillation (RND) (Burda et al., 2018b) and Model Disagreement (Pathak et al., 2019). |
| Researcher Affiliation | Academia | 1Department of Computer Science, Stanford University 2Department of Psychology, Harvard University 3Graduate School of Education, Stanford University 4Department of Psychology, Stanford University. |
| Pseudocode | Yes | Algorithm 1 AWML with γ-Progress |
| Open Source Code | No | The paper provides links to videos of the environment but does not explicitly state that the source code for the methodology is available or provide a link to a code repository. |
| Open Datasets | No | The paper states that it uses a "custom-built 3D virtual world environment" but does not provide any concrete access information (link, DOI, citation) to make this dataset publicly available. |
| Dataset Splits | No | The paper mentions "validation losses" and that "For details on each behavior-specific validation case and metric computation, we refer readers to Appendix F." However, it does not provide specific dataset split percentages or counts for validation in the main text. |
| Hardware Specification | No | The acknowledgements mention "hardware donation from the NVIDIA Corporation", but this is a general statement and does not specify the exact GPU models, CPU models, or other detailed hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using the DQN learning algorithm and describes network architectures (e.g., LSTM, MLP, fully-connected networks), but it does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | Our controller φ is a two-layer fully-connected network with 512 hidden units that takes as input xt 2:t and outputs estimated Q-values for 9 possible actions which rotate the curious agent at different velocities. φ is updated with the DQN (Mnih et al., 2013) learning algorithm using the cost: ... with γ = 0.9995 across all experiments. |