MOPO: Model-based Offline Policy Optimization
Authors: Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Y. Zou, Sergey Levine, Chelsea Finn, Tengyu Ma
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we aim to study the follow questions: (1) How does MOPO perform on standard offline RL benchmarks in comparison to prior state-of-the-art approaches? (2) Can MOPO solve tasks that require generalization to out-of-distribution behaviors? (3) How does each component in MOPO affect performance? |
| Researcher Affiliation | Academia | Stanford University1, UC Berkeley2 {tianheyu,gwthomas}@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Framework for Model-based Offline Policy Optimization (MOPO) with Reward Penalty |
| Open Source Code | Yes | The code is available online5. 5Code is released at https://github.com/tianheyu927/mopo. |
| Open Datasets | Yes | To answer question (1), we evaluate our method on a large subset of datasets in the D4RL benchmark [18] based on the Mu Jo Co simulator [69], including three environments (halfcheetah, hopper, and walker2d) and four dataset types (random, medium, mixed, medium-expert), yielding a total of 12 problem settings. |
| Dataset Splits | No | The paper describes the types of datasets used from the D4RL benchmark (random, medium, mixed, medium-expert) but does not provide specific train/validation/test split percentages or counts for reproduction. |
| Hardware Specification | No | For more details on the experimental set-up and hyperparameters, see Appendix G. |
| Software Dependencies | No | The paper mentions the use of 'Mu Jo Co simulator' but does not provide specific software names with version numbers. |
| Experiment Setup | Yes | For more details on the experimental set-up and hyperparameters, see Appendix G. |