Recurrent World Models Facilitate Policy Evolution

Authors: David Ha, Jürgen Schmidhuber

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our approach can be used to solve a challenging race car navigation from pixels task that previously has not been solved using more traditional methods. In this section, we describe how we can train the Agent model described earlier to solve a car racing task. Table 1: Car Racing-v0 results over 100 trials. Table 2: Doom Take Cover-v0 results, varying τ.
Researcher Affiliation Collaboration David Ha Google Brain Tokyo, Japan hadavid@google.com Jürgen Schmidhuber NNAISENSE The Swiss AI Lab, IDSIA (USI & SUPSI) juergen@idsia.ch
Pseudocode Yes Figure 2: Flow diagram showing how V, M, and C interacts with the environment (left). Pseudocode for how our agent model is used in the Open AI Gym [5] environment (right). Algorithm 1 Training procedure in our experiments.
Open Source Code No Interactive version of paper: https://worldmodels.github.io. This is a link to an interactive demo page, not an explicit statement of open-source code for the methodology.
Open Datasets No To train V, we first collect a dataset of 10k random rollouts of the environment. The paper describes generating its own dataset from public environments, but does not provide concrete access information (link, DOI, repository, or formal citation) for this specific collected dataset.
Dataset Splits No The paper does not provide specific dataset split information (e.g., exact percentages or sample counts for training, validation, and testing).
Hardware Specification No The paper mentions training on "a single machine with multiple CPU cores" and "on a single GPU" but does not provide specific hardware details such as CPU or GPU models.
Software Dependencies No The paper mentions software components like "Open AI Gym", "Variational Autoencoder (VAE)", "MDN-RNN", and "Covariance-Matrix Adaptation Evolution Strategy (CMA-ES)" but does not specify their version numbers.
Experiment Setup Yes During sampling, we can adjust a real-valued temperature parameter τ to control model uncertainty... Table 2: Doom Take Cover-v0 results, varying τ. The table provides concrete values for the hyperparameter τ (0.10, 0.50, 1.00, 1.15, 1.30).