reproducibilityindex.ai

Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data

Authors: Allen Nie, Yannis Flet-Berliac, Deon Jordan, William Steenbergen, Emma Brunskill

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared to alternate approaches, our proposed pipeline outputs higher-performing deployed policies from a broad range of offline policy learning algorithms and across various simulation domains in healthcare, education, and robotics.
Researcher Affiliation	Academia	Allen Nie Yannis Flet-Berliac Deon R. Jordan William Steenbergen Emma Brunskill Department of Computer Science Stanford University *anie@stanford.edu
Pseudocode	Yes	We propose a general pipeline: Split Select Retrain (SSR) (of which we provide a pseudo-code in Algorithm 1, Appendix A.4)
Open Source Code	Yes	3. (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See Supplementary Material.
Open Datasets	Yes	We conduct experiments on eight datasets (Figure 2) from five domains (details in Appendix A.14), which we give a short description below, and use as many as 540 candidate AH pairs for the Sepsis POMDP domain. ... D4RL (Fu et al., 2020) is an offline RL standardized benchmark designed and commonly used to evaluate the progress of offline RL algorithms. ... Robomimic (Mandlekar et al., 2021) is composed of various continuous control robotics environments with suboptimal human data.
Dataset Splits	Yes	The simplest method to train and verify an algorithm s performance without access to any simulator is to split the data into a train Dtrain and valid set Dvalid. ... First, we split and create different partitions of the input dataset. For each train/validation split, each algorithm-hyperparameter (AH) is trained on the training set and evaluated using the input OPE method to yield an estimated value on the validation set. ... In the proof and our experiments, we focus on when the training and validation sets are of equal size.
Hardware Specification	No	The paper does not specify the exact hardware used for experiments (e.g., specific GPU models, CPU types, or detailed cloud instance configurations). While it indicates 'Yes' to question 3(d) in its self-assessment about compute resources, this information is not found within the provided paper text.
Software Dependencies	No	The paper mentions various algorithms and methods (e.g., WIS, FQE, BC, CQL) but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	We experiment with popular offline RL methods (see Table 2 and we provide algorithmic and hyperparameter details in Table A.2). ... Table A.2: Hyperparameters