reproducibilityindex.ai

When to Trust Your Model: Model-Based Policy Optimization

Authors: Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluation aims to study two primary questions: (1) How well does MBPO perform on benchmark reinforcement learning tasks, compared to state-of-the-art model-based and model-free algorithms? (2) What conclusions can we draw about appropriate model usage? (Figure 2: Training curves of MBPO and five baselines on continuous control benchmarks. Solid curves depict the mean of five trials and shaded regions correspond to standard deviation among trials.)
Researcher Affiliation	Academia	Michael Janner Justin Fu Marvin Zhang Sergey Levine University of California, Berkeley {janner, justinjfu, marvin, svlevine}@eecs.berkeley.edu
Pseudocode	Yes	Algorithm 1 Monotonic Model-Based Policy Optimization; Algorithm 2 Model-Based Policy Optimization with Deep Reinforcement Learning
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the methodology.
Open Datasets	Yes	We evaluate MBPO and these baselines on a set of MuJoCo continuous control tasks (Todorov et al., 2012) commonly used to evaluate model-free algorithms.
Dataset Splits	No	The paper mentions 'validation loss of the model' but does not provide specific details on the validation dataset split (e.g., percentages, sample counts, or explicit standard split references).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions software like MuJoCo, SAC, and PPO, but does not provide specific version numbers for any software components, which is required for reproducibility.
Experiment Setup	Yes	A full listing of the hyperparameters included in Algorithm 2 for all evaluation environments is given in Appendix C.