Estimating Calibrated Individualized Survival Curves with Deep Learning

Authors: Fahad Kamran, Jenna Wiens240-248

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared to state-of-the-art approaches across two publicly available datasets, our proposed training scheme leads to significant improvements in calibration, while maintaining good discriminative performance.
Researcher Affiliation Academia Fahad Kamran, Jenna Wiens Computer Science and Engineering University of Michigan, Ann Arbor, MI fhdkmrn, wiensj@umich.edu
Pseudocode No The paper describes methodologies mathematically and verbally but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes All deep models were built in Py Torch 1, while MTLR was implemented using the corresponding R package (Paszke et al. 2019; Haider 2019). [Footnote 1 points to https://github.com/MLD3/Calibrated-Survival-Analysis]
Open Datasets Yes We consider two publicly available datasets: the Northern Alberta Cancer Dataset (NACD) consists of 2,402 individuals with various forms of cancer (Haider et al. 2020; Yu et al. 2011). [...] CLINIC records the survival status of 6,036 patients in a hospital, with 13.2% being censored (Knaus et al. 1995).
Dataset Splits Yes We separate our data into training/validation/test sets using a 60/20/20% split.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions 'Py Torch 1' and 'R package' for MTLR, but it does not provide specific version numbers for these software components. For example, 'Py Torch 1' is not a precise version like '1.9' or '1.10'.
Experiment Setup Yes Across experiments, we use the same DRSA architecture: a one-layer LSTM with hidden size 100 and a single feed-forward layer with a sigmoid activation on the output for each time-step (Ren et al. 2019). We separate our data into training/validation/test sets using a 60/20/20% split. For training, we use Adam and a batch size of 50 (Kingma and Ba 2015). We train for 100 epochs (which, empirically, was enough for models to converge) and select the best model based on a validation set.