When to Trust Your Model: Model-Based Policy Optimization
Authors: Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation aims to study two primary questions: (1) How well does MBPO perform on benchmark reinforcement learning tasks, compared to state-of-the-art model-based and model-free algorithms? (2) What conclusions can we draw about appropriate model usage? (Figure 2: Training curves of MBPO and five baselines on continuous control benchmarks. Solid curves depict the mean of five trials and shaded regions correspond to standard deviation among trials.) |
| Researcher Affiliation | Academia | Michael Janner Justin Fu Marvin Zhang Sergey Levine University of California, Berkeley {janner, justinjfu, marvin, svlevine}@eecs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Monotonic Model-Based Policy Optimization; Algorithm 2 Model-Based Policy Optimization with Deep Reinforcement Learning |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology. |
| Open Datasets | Yes | We evaluate MBPO and these baselines on a set of MuJoCo continuous control tasks (Todorov et al., 2012) commonly used to evaluate model-free algorithms. |
| Dataset Splits | No | The paper mentions 'validation loss of the model' but does not provide specific details on the validation dataset split (e.g., percentages, sample counts, or explicit standard split references). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions software like MuJoCo, SAC, and PPO, but does not provide specific version numbers for any software components, which is required for reproducibility. |
| Experiment Setup | Yes | A full listing of the hyperparameters included in Algorithm 2 for all evaluation environments is given in Appendix C. |