Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models
Authors: Michael Oberst, David Sontag
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the utility of this procedure with a synthetic environment of sepsis management. |
| Researcher Affiliation | Academia | 1CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA. |
| Pseudocode | No | The paper describes a 'Monte Carlo procedure for drawing counterfactual trajectories' and methods for posterior inference in text, but does not include a structured pseudocode or algorithm block. |
| Open Source Code | Yes | All the code required to reproduce our experiments is available online at https://www.github.com/clinicalml/gumbel-max-scm |
| Open Datasets | No | The paper uses '1000 patient trajectories from the simulator' which was constructed for this illustration, but there is no explicit statement or link confirming this specific dataset is publicly available. |
| Dataset Splits | No | The paper mentions generating '1000 patient trajectories from the simulator' and using them to 'learn the parameters of the finite MDP,' but it does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') required for replicating the experiments. |
| Experiment Setup | Yes | Our simulator includes four vital signs (heart rate, blood pressure, oxygen concentration, and glucose levels) with discrete states (e.g., low, normal, high), along with three treatment options (antibiotics, vasopressors, and mechanical ventilation), all of which can be applied at each time step. Reward is +1 for discharge of a patient, and -1 for death. [...] the behaviour policy was constructed using Policy Iteration (Sutton & Barto, 2017) with full access to the parameters of the underlying MDP (including diabetes state). [...] To introduce variation, the policy takes a random alternative action w.p. 0.05. [...] The target policy is learned using Policy Iteration on the parameters of the learned MDP. |