reproducibilityindex.ai

COMBO: Conservative Offline Model-Based Policy Optimization

Authors: Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we find that COMBO attains greater performance compared to prior offline RL on problems that demand generalization to related but previously unseen tasks, and also consistently matches or outperforms prior offline RL methods on widely studied offline RL benchmarks, including image-based tasks.
Researcher Affiliation	Collaboration	1Stanford University, 2UC Berkeley, 3Facebook AI Research
Pseudocode	Yes	Algorithm 1 COMBO: Conservative Model Based Offline Policy Optimization
Open Source Code	No	The paper does not provide an explicit statement or link to its open-source code for the methodology described.
Open Datasets	Yes	We evaluate COMBO on the Open AI Gym [6] domains in the D4RL benchmark [12], which contains three environments (halfcheetah, hopper, and walker2d) and four dataset types (random, medium, medium-replay, and medium-expert). (Section 5.3)
Dataset Splits	Yes	For all MuJoCo tasks from D4RL, we use the standard settings provided by D4RL. (Appendix B.1) For the walker task we construct 4 datasets: medium-replay (M-R), medium (M), medium-expert (ME), and expert, similar to Fu et al. [12], each consisting of 200 trajectories. (Section 5.2)
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Additional details about the practical implementation and the hyperparameter selection rule are provided in Appendix B.1 and Appendix B.2 respectively. (Section 3)