When to Trust Your Model: Model-Based Policy Optimization

Authors: Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental evaluation aims to study two primary questions: (1) How well does MBPO perform on benchmark reinforcement learning tasks, compared to state-of-the-art model-based and model-free algorithms? (2) What conclusions can we draw about appropriate model usage? (Figure 2: Training curves of MBPO and five baselines on continuous control benchmarks. Solid curves depict the mean of five trials and shaded regions correspond to standard deviation among trials.)
Researcher Affiliation Academia Michael Janner Justin Fu Marvin Zhang Sergey Levine University of California, Berkeley {janner, justinjfu, marvin, svlevine}@eecs.berkeley.edu
Pseudocode Yes Algorithm 1 Monotonic Model-Based Policy Optimization; Algorithm 2 Model-Based Policy Optimization with Deep Reinforcement Learning
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the methodology.
Open Datasets Yes We evaluate MBPO and these baselines on a set of MuJoCo continuous control tasks (Todorov et al., 2012) commonly used to evaluate model-free algorithms.
Dataset Splits No The paper mentions 'validation loss of the model' but does not provide specific details on the validation dataset split (e.g., percentages, sample counts, or explicit standard split references).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper mentions software like MuJoCo, SAC, and PPO, but does not provide specific version numbers for any software components, which is required for reproducibility.
Experiment Setup Yes A full listing of the hyperparameters included in Algorithm 2 for all evaluation environments is given in Appendix C.