Finding Counterfactually Optimal Action Sequences in Continuous State Spaces

Authors: Stratis Tsirtsis, Manuel Rodriguez

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we evaluate the performance and the qualitative insights of our method by performing a series of experiments using real patient data from critical care.
Researcher Affiliation Academia Stratis Tsirtsis Max Planck Institute for Software Systems Kaiserslautern, Germany stsirtsis@mpi-sws.org Manuel Gomez-Rodriguez Max Planck Institute for Software Systems Kaiserslautern, Germany manuelgr@mpi-sws.org
Pseudocode Yes Algorithm 2: Graph search via A*
Open Source Code Yes 1Our code is accessible at https://github.com/Networks-Learning/counterfactual-continuous-mdp.
Open Datasets Yes To evaluate our method, we use real patient data from MIMIC-III [54], a freely accessible critical care dataset commonly used in reinforcement learning for healthcare [6, 55 57].
Dataset Splits Yes Specifically, for each configuration of Lh and Lφ, we randomly split the dataset into a training and a validation set (with a size ratio 4-to-1), we train the corresponding SCM using the training set, and we evaluate the log-likelihood of the validation set based on the trained SCM.
Hardware Specification Yes All experiments were performed using an internal cluster of machines equipped with 16 Intel(R) Xeon(R) 3.20GHz CPU cores, 512GBs of memory and 2 NVIDIA A40 48GB GPUs.
Software Dependencies No The paper mentions using 'neural networks' and the 'Adam optimizer' but does not specify software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or their specific version numbers.
Experiment Setup Yes We use an SCM with Lipschitz constants Lh = 1.0, Lφ = 0.1... We jointly train the weights of the networks h and φ and the covariance matrix of the noise prior on the observed patient transitions using stochastic gradient descent with the negative log-likelihood of each transition as a loss. Subsequently, we optimize those parameters using the Adam optimizer with a learning rate of 0.001, a batch size of 256, and we train the model for 100 epochs.