COMBO: Conservative Offline Model-Based Policy Optimization
Authors: Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we find that COMBO attains greater performance compared to prior offline RL on problems that demand generalization to related but previously unseen tasks, and also consistently matches or outperforms prior offline RL methods on widely studied offline RL benchmarks, including image-based tasks. |
| Researcher Affiliation | Collaboration | 1Stanford University, 2UC Berkeley, 3Facebook AI Research |
| Pseudocode | Yes | Algorithm 1 COMBO: Conservative Model Based Offline Policy Optimization |
| Open Source Code | No | The paper does not provide an explicit statement or link to its open-source code for the methodology described. |
| Open Datasets | Yes | We evaluate COMBO on the Open AI Gym [6] domains in the D4RL benchmark [12], which contains three environments (halfcheetah, hopper, and walker2d) and four dataset types (random, medium, medium-replay, and medium-expert). (Section 5.3) |
| Dataset Splits | Yes | For all MuJoCo tasks from D4RL, we use the standard settings provided by D4RL. (Appendix B.1) For the walker task we construct 4 datasets: medium-replay (M-R), medium (M), medium-expert (ME), and expert, similar to Fu et al. [12], each consisting of 200 trajectories. (Section 5.2) |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Additional details about the practical implementation and the hyperparameter selection rule are provided in Appendix B.1 and Appendix B.2 respectively. (Section 3) |