Reliable Off-Policy Learning for Dosage Combinations

Authors: Jonas Schweisthal, Dennis Frauen, Valentyn Melnychuk, Stefan Feuerriegel

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We finally perform an extensive evaluation of our method to show its effectiveness. To the best of our knowledge, ours is the first work to provide a method for reliable off-policy learning for optimal dosage combinations. ... We perform extensive experiments using semi-synthetic data from real-world medical settings to evaluate the effectiveness of our method.
Researcher Affiliation Academia Jonas Schweisthal, Dennis Frauen, Valentyn Melnychuk & Stefan Feuerriegel LMU Munich Munich Center for Machine Learning {jonas.schweisthal,frauen,melnychuk,feuerriegel}@lmu.de
Pseudocode Yes Algorithm 1: Reliable off-policy learning for dosage combinations
Open Source Code Yes 1Code is available at https://github.com/JSchweisthal/Reliable Dosage Combi.
Open Datasets Yes MIMIC-IV: MIMIC-IV [35] is a state-of-the-art dataset with de-identified health records from patients admitted to intensive care units. ... TCGA: TCGA [79] is a diverse collection of gene expression data from patients with different cancer types.
Dataset Splits Yes For training, we split the data into train / val / test sets (64 / 16 / 20%).
Hardware Specification No The paper does not specify the hardware used for running the experiments (e.g., specific GPU or CPU models).
Software Dependencies No The paper mentions using "Adam optimizer [43]" and "neural spline flows [15] in combination with masked auto-regressive networks [13]" but does not specify version numbers for these or other software dependencies like Python or PyTorch.
Experiment Setup Yes For training our DCNet ˆµ(t, x), we follow previous works from the baseline VCNet [2, 56]. We set the representation network ϕ to a multi-layer perceptron (MLP) with two hidden layers with 50 hidden neurons each and Re LU activation function. For the parameters of the prediction head h, we use the same model choices as for ϕ. Additionally, we use B-splines with degree two and place the knots of the tensor product basis at {1/3, 2/3}p. For the baseline MLP from Sec. 5, we ensure similar flexibility for fair comparison, and we thus select a MLP with four hidden layers with 50 hidden units each and Re LU activation. We train the networks minimizing the mean squared error (MSE) loss Lµ = 1 n Pn i=1 (ˆµ(ti, xi) yi)2. For optimization, we use the Adam optimizer [43] with batch size 1000 and train the network for a maximum of 800 epochs using early stopping with a patience of 50 on the MSE loss on the factual validation dataset. We tune the learning rate within the search space {0.0001, 0.0005, 0.001, 0.005, 0.01}, and, for evaluation, we use the same criterion as for early stopping.