COMBO: Conservative Offline Model-Based Policy Optimization

Authors: Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we find that COMBO attains greater performance compared to prior offline RL on problems that demand generalization to related but previously unseen tasks, and also consistently matches or outperforms prior offline RL methods on widely studied offline RL benchmarks, including image-based tasks.
Researcher Affiliation Collaboration 1Stanford University, 2UC Berkeley, 3Facebook AI Research
Pseudocode Yes Algorithm 1 COMBO: Conservative Model Based Offline Policy Optimization
Open Source Code No The paper does not provide an explicit statement or link to its open-source code for the methodology described.
Open Datasets Yes We evaluate COMBO on the Open AI Gym [6] domains in the D4RL benchmark [12], which contains three environments (halfcheetah, hopper, and walker2d) and four dataset types (random, medium, medium-replay, and medium-expert). (Section 5.3)
Dataset Splits Yes For all MuJoCo tasks from D4RL, we use the standard settings provided by D4RL. (Appendix B.1) For the walker task we construct 4 datasets: medium-replay (M-R), medium (M), medium-expert (ME), and expert, similar to Fu et al. [12], each consisting of 200 trajectories. (Section 5.2)
Hardware Specification No The paper does not provide specific details about the hardware used for experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Additional details about the practical implementation and the hyperparameter selection rule are provided in Appendix B.1 and Appendix B.2 respectively. (Section 3)