reproducibilityindex.ai

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

Authors: Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher abdulhai, Scott Sanner

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the standard D4RL continuous control tasks, we find that our method significantly outperforms previous model-based approaches: e.g., MOPO by 116.4%, MORe L by 23.2% and COMBO by 23.7%. Further, CBOP achieves state-of-the-art performance on 11 out of 18 benchmark datasets while doing on par on the remaining datasets. We evaluate CBOP on the D4RL benchmark of continuous control tasks (Fu et al., 2020).
Researcher Affiliation	Collaboration	1University of Toronto, 2LG AI Research, 3Vector Institute
Pseudocode	Yes	Algorithm 1 Conservative Bayesian MVE; Please see Algorithm 2 in Appendix B.1 for the full description of CBOP.
Open Source Code	Yes	We release our code at https://github.com/jihwan-jeong/CBOP.
Open Datasets	Yes	We evaluate these RQs on the standard D4RL ofﬂine RL benchmark (Fu et al., 2020).
Dataset Splits	No	The paper mentions using the D4RL benchmark but does not provide specific details on training, validation, or testing dataset splits, such as percentages, sample counts, or explicit instructions for replication.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or server configurations used to run the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., libraries, frameworks, or programming languages with their exact versions).
Experiment Setup	Yes	We use Adam optimizer with a learning rate of 1e-4 for all networks except the Q functions (3e-4) and a batch size of 256. For the D4RL experiments, we train for 1M gradient steps. ... For all experiments, we use discount factor γ = 0.99.