Recurrent World Models Facilitate Policy Evolution
Authors: David Ha, Jürgen Schmidhuber
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our approach can be used to solve a challenging race car navigation from pixels task that previously has not been solved using more traditional methods. In this section, we describe how we can train the Agent model described earlier to solve a car racing task. Table 1: Car Racing-v0 results over 100 trials. Table 2: Doom Take Cover-v0 results, varying τ. |
| Researcher Affiliation | Collaboration | David Ha Google Brain Tokyo, Japan hadavid@google.com Jürgen Schmidhuber NNAISENSE The Swiss AI Lab, IDSIA (USI & SUPSI) juergen@idsia.ch |
| Pseudocode | Yes | Figure 2: Flow diagram showing how V, M, and C interacts with the environment (left). Pseudocode for how our agent model is used in the Open AI Gym [5] environment (right). Algorithm 1 Training procedure in our experiments. |
| Open Source Code | No | Interactive version of paper: https://worldmodels.github.io. This is a link to an interactive demo page, not an explicit statement of open-source code for the methodology. |
| Open Datasets | No | To train V, we first collect a dataset of 10k random rollouts of the environment. The paper describes generating its own dataset from public environments, but does not provide concrete access information (link, DOI, repository, or formal citation) for this specific collected dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., exact percentages or sample counts for training, validation, and testing). |
| Hardware Specification | No | The paper mentions training on "a single machine with multiple CPU cores" and "on a single GPU" but does not provide specific hardware details such as CPU or GPU models. |
| Software Dependencies | No | The paper mentions software components like "Open AI Gym", "Variational Autoencoder (VAE)", "MDN-RNN", and "Covariance-Matrix Adaptation Evolution Strategy (CMA-ES)" but does not specify their version numbers. |
| Experiment Setup | Yes | During sampling, we can adjust a real-valued temperature parameter τ to control model uncertainty... Table 2: Doom Take Cover-v0 results, varying τ. The table provides concrete values for the hyperparameter τ (0.10, 0.50, 1.00, 1.15, 1.30). |