reproducibilityindex.ai

MOPO: Model-based Offline Policy Optimization

Authors: Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Y. Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we aim to study the follow questions: (1) How does MOPO perform on standard ofﬂine RL benchmarks in comparison to prior state-of-the-art approaches? (2) Can MOPO solve tasks that require generalization to out-of-distribution behaviors? (3) How does each component in MOPO affect performance?
Researcher Affiliation	Academia	Stanford University1, UC Berkeley2 {tianheyu,gwthomas}@cs.stanford.edu
Pseudocode	Yes	Algorithm 1 Framework for Model-based Ofﬂine Policy Optimization (MOPO) with Reward Penalty
Open Source Code	Yes	The code is available online5. 5Code is released at https://github.com/tianheyu927/mopo.
Open Datasets	Yes	To answer question (1), we evaluate our method on a large subset of datasets in the D4RL benchmark [18] based on the Mu Jo Co simulator [69], including three environments (halfcheetah, hopper, and walker2d) and four dataset types (random, medium, mixed, medium-expert), yielding a total of 12 problem settings.
Dataset Splits	No	The paper describes the types of datasets used from the D4RL benchmark (random, medium, mixed, medium-expert) but does not provide specific train/validation/test split percentages or counts for reproduction.
Hardware Specification	No	For more details on the experimental set-up and hyperparameters, see Appendix G.
Software Dependencies	No	The paper mentions the use of 'Mu Jo Co simulator' but does not provide specific software names with version numbers.
Experiment Setup	Yes	For more details on the experimental set-up and hyperparameters, see Appendix G.