reproducibilityindex.ai

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models

Authors: Michael Oberst, David Sontag

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the utility of this procedure with a synthetic environment of sepsis management.
Researcher Affiliation	Academia	1CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA.
Pseudocode	No	The paper describes a 'Monte Carlo procedure for drawing counterfactual trajectories' and methods for posterior inference in text, but does not include a structured pseudocode or algorithm block.
Open Source Code	Yes	All the code required to reproduce our experiments is available online at https://www.github.com/clinicalml/gumbel-max-scm
Open Datasets	No	The paper uses '1000 patient trajectories from the simulator' which was constructed for this illustration, but there is no explicit statement or link confirming this specific dataset is publicly available.
Dataset Splits	No	The paper mentions generating '1000 patient trajectories from the simulator' and using them to 'learn the parameters of the ﬁnite MDP,' but it does not specify any training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') required for replicating the experiments.
Experiment Setup	Yes	Our simulator includes four vital signs (heart rate, blood pressure, oxygen concentration, and glucose levels) with discrete states (e.g., low, normal, high), along with three treatment options (antibiotics, vasopressors, and mechanical ventilation), all of which can be applied at each time step. Reward is +1 for discharge of a patient, and -1 for death. [...] the behaviour policy was constructed using Policy Iteration (Sutton & Barto, 2017) with full access to the parameters of the underlying MDP (including diabetes state). [...] To introduce variation, the policy takes a random alternative action w.p. 0.05. [...] The target policy is learned using Policy Iteration on the parameters of the learned MDP.