reproducibilityindex.ai

Revisiting Design Choices in Offline Model Based Reinforcement Learning

Authors: Cong Lu, Philip Ball, Jack Parker-Holder, Michael Osborne, Stephen J. Roberts

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using these insights, we show that selecting these key hyperparameters using Bayesian Optimization produces superior conﬁgurations that are vastly different to those currently used in existing hand-tuned state-of-the-art methods, and result in drastically stronger performance.
Researcher Affiliation	Academia	Cong Lu , Philip J. Ball , Jack Parker-Holder, Michael A. Osborne, Stephen J. Roberts Department of Engineering University of Oxford
Pseudocode	No	The paper describes algorithms and methods in prose, but it does not include any formal pseudocode blocks or algorithm listings.
Open Source Code	No	The paper mentions that 'The D4RL (Fu et al., 2021a) codebase and datasets used for the empirical evaluation is available under the CC BY 4.0 Licence.', but this refers to a third-party dataset and codebase, not the authors' own implementation of their methodology or experiments.
Open Datasets	Yes	Using D4RL (Fu et al., 2021a), we train models on each dataset, then evaluate them on other datasets from the same environment, but collected under different policies.
Dataset Splits	No	The paper uses the D4RL datasets and refers to 'train' and 'test' scenarios, but it does not explicitly provide details about specific training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit mention of validation set usage for hyperparameter tuning) beyond using benchmark datasets.
Hardware Specification	Yes	Each BO iteration is run for 300 epochs on a single seed, and the full optimization over an ofﬂine dataset took ~200 hours on a NVIDIA Ge Force GTX 1080 Ti GPU taken up predominantly by MOPO training.
Software Dependencies	No	The paper mentions using Python and specific algorithms/frameworks like SAC and Bayesian Optimization (CASMOPOLITAN), but it does not provide specific version numbers for any software dependencies (e.g., Python version, PyTorch version, or specific library versions).
Experiment Setup	Yes	We deﬁne our search space over hyperparameters most related to uncertainty quantiﬁcation: Penalty type (categorical): taking values over {Max Aleatoric, Max Pairwise Diff, LOO KL, LL Var, Ensemble Std, Ensemble Variance}. Penalty scale λ (continuous): taking values over [1, 100]. h (integer): taking values over {1, 2, . . . , 50}. Models N (integer): taking values over {1, 2, . . . , 15}.