Finding Counterfactually Optimal Action Sequences in Continuous State Spaces
Authors: Stratis Tsirtsis, Manuel Rodriguez
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we evaluate the performance and the qualitative insights of our method by performing a series of experiments using real patient data from critical care. |
| Researcher Affiliation | Academia | Stratis Tsirtsis Max Planck Institute for Software Systems Kaiserslautern, Germany stsirtsis@mpi-sws.org Manuel Gomez-Rodriguez Max Planck Institute for Software Systems Kaiserslautern, Germany manuelgr@mpi-sws.org |
| Pseudocode | Yes | Algorithm 2: Graph search via A* |
| Open Source Code | Yes | 1Our code is accessible at https://github.com/Networks-Learning/counterfactual-continuous-mdp. |
| Open Datasets | Yes | To evaluate our method, we use real patient data from MIMIC-III [54], a freely accessible critical care dataset commonly used in reinforcement learning for healthcare [6, 55 57]. |
| Dataset Splits | Yes | Specifically, for each configuration of Lh and Lφ, we randomly split the dataset into a training and a validation set (with a size ratio 4-to-1), we train the corresponding SCM using the training set, and we evaluate the log-likelihood of the validation set based on the trained SCM. |
| Hardware Specification | Yes | All experiments were performed using an internal cluster of machines equipped with 16 Intel(R) Xeon(R) 3.20GHz CPU cores, 512GBs of memory and 2 NVIDIA A40 48GB GPUs. |
| Software Dependencies | No | The paper mentions using 'neural networks' and the 'Adam optimizer' but does not specify software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or their specific version numbers. |
| Experiment Setup | Yes | We use an SCM with Lipschitz constants Lh = 1.0, Lφ = 0.1... We jointly train the weights of the networks h and φ and the covariance matrix of the noise prior on the observed patient transitions using stochastic gradient descent with the negative log-likelihood of each transition as a loss. Subsequently, we optimize those parameters using the Adam optimizer with a learning rate of 0.001, a batch size of 256, and we train the model for 100 epochs. |