reproducibilityindex.ai

Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding

Authors: Alizée Pace, Hugo Yèche, Bernhard Schölkopf, Gunnar Ratsch, Guy Tennenholtz

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate through extensive experiments and ablations the efficacy of our approach on a sepsis management benchmark, as well as on electronic health records.
Researcher Affiliation	Collaboration	1 ETH AI Center 2 Department of Computer Science, ETH Z urich 3 Max Planck Institute for Intelligent Systems, T ubingen 4 Google Research
Pseudocode	Yes	Algorithm 1: Delphic Offline Reinforcement Learning
Open Source Code	Yes	We include source code as supplementary material.
Open Datasets	Yes	Our real-world data experiment is based on the publicly available Hi RID dataset (Hyland et al., 2020).
Dataset Splits	Yes	Training is carried out for 50 epochs or until loss on the validation subset (10% of training data) increases for more than 5 consecutive epochs.
Hardware Specification	Yes	Training is carried out on NVIDIA RTX2080Ti GPUs on our local cluster, using the Adam optimiser with default learning rate and a batch size of 32.
Software Dependencies	No	All reinforcement learning algorithms and baselines are implemented based on the open access d3rlpy library (Seno and Imai, 2022).
Experiment Setup	Yes	Between world models w, the confounder space dimensionality is randomly varied over \|Z\| = {1, 2, 4, 8, 16}, and the prior for p(z) = N(z; 0, Σ2) is randomly varied through the variance for each z-dimension, Σ2 ii = {1.0, 0.1, 0.01}. The discount factor used is γ = 0.99, and state and actions are normalised to mean 0 and variance 1 (Fujimoto and Gu, 2021) for all algorithms. Training is carried out... with a batch size of 32. Discrete CQL (Kumar et al., 2019) is implemented with a penalty hyperparameter α of 1.0 (sepsis environment) and 0.5 (ICU dataset), tuned over the following values: {0.1, 0.5, 1.0, 2.0, 5.0}.