Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models

Authors: Michael Oberst, David Sontag

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the utility of this procedure with a synthetic environment of sepsis management.
Researcher Affiliation Academia 1CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA.
Pseudocode No The paper describes a 'Monte Carlo procedure for drawing counterfactual trajectories' and methods for posterior inference in text, but does not include a structured pseudocode or algorithm block.
Open Source Code Yes All the code required to reproduce our experiments is available online at https://www.github.com/clinicalml/gumbel-max-scm
Open Datasets No The paper uses '1000 patient trajectories from the simulator' which was constructed for this illustration, but there is no explicit statement or link confirming this specific dataset is publicly available.
Dataset Splits No The paper mentions generating '1000 patient trajectories from the simulator' and using them to 'learn the parameters of the finite MDP,' but it does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') required for replicating the experiments.
Experiment Setup Yes Our simulator includes four vital signs (heart rate, blood pressure, oxygen concentration, and glucose levels) with discrete states (e.g., low, normal, high), along with three treatment options (antibiotics, vasopressors, and mechanical ventilation), all of which can be applied at each time step. Reward is +1 for discharge of a patient, and -1 for death. [...] the behaviour policy was constructed using Policy Iteration (Sutton & Barto, 2017) with full access to the parameters of the underlying MDP (including diabetes state). [...] To introduce variation, the policy takes a random alternative action w.p. 0.05. [...] The target policy is learned using Policy Iteration on the parameters of the learned MDP.