reproducibilityindex.ai

Recurrent World Models Facilitate Policy Evolution

Authors: David Ha, Jürgen Schmidhuber

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our approach can be used to solve a challenging race car navigation from pixels task that previously has not been solved using more traditional methods. In this section, we describe how we can train the Agent model described earlier to solve a car racing task. Table 1: Car Racing-v0 results over 100 trials. Table 2: Doom Take Cover-v0 results, varying τ.
Researcher Affiliation	Collaboration	David Ha Google Brain Tokyo, Japan hadavid@google.com Jürgen Schmidhuber NNAISENSE The Swiss AI Lab, IDSIA (USI & SUPSI) juergen@idsia.ch
Pseudocode	Yes	Figure 2: Flow diagram showing how V, M, and C interacts with the environment (left). Pseudocode for how our agent model is used in the Open AI Gym [5] environment (right). Algorithm 1 Training procedure in our experiments.
Open Source Code	No	Interactive version of paper: https://worldmodels.github.io. This is a link to an interactive demo page, not an explicit statement of open-source code for the methodology.
Open Datasets	No	To train V, we ﬁrst collect a dataset of 10k random rollouts of the environment. The paper describes generating its own dataset from public environments, but does not provide concrete access information (link, DOI, repository, or formal citation) for this specific collected dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., exact percentages or sample counts for training, validation, and testing).
Hardware Specification	No	The paper mentions training on "a single machine with multiple CPU cores" and "on a single GPU" but does not provide specific hardware details such as CPU or GPU models.
Software Dependencies	No	The paper mentions software components like "Open AI Gym", "Variational Autoencoder (VAE)", "MDN-RNN", and "Covariance-Matrix Adaptation Evolution Strategy (CMA-ES)" but does not specify their version numbers.
Experiment Setup	Yes	During sampling, we can adjust a real-valued temperature parameter τ to control model uncertainty... Table 2: Doom Take Cover-v0 results, varying τ. The table provides concrete values for the hyperparameter τ (0.10, 0.50, 1.00, 1.15, 1.30).