reproducibilityindex.ai

Estimating Calibrated Individualized Survival Curves with Deep Learning

Authors: Fahad Kamran, Jenna Wiens240-248

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared to state-of-the-art approaches across two publicly available datasets, our proposed training scheme leads to signiﬁcant improvements in calibration, while maintaining good discriminative performance.
Researcher Affiliation	Academia	Fahad Kamran, Jenna Wiens Computer Science and Engineering University of Michigan, Ann Arbor, MI fhdkmrn, wiensj@umich.edu
Pseudocode	No	The paper describes methodologies mathematically and verbally but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All deep models were built in Py Torch 1, while MTLR was implemented using the corresponding R package (Paszke et al. 2019; Haider 2019). [Footnote 1 points to https://github.com/MLD3/Calibrated-Survival-Analysis]
Open Datasets	Yes	We consider two publicly available datasets: the Northern Alberta Cancer Dataset (NACD) consists of 2,402 individuals with various forms of cancer (Haider et al. 2020; Yu et al. 2011). [...] CLINIC records the survival status of 6,036 patients in a hospital, with 13.2% being censored (Knaus et al. 1995).
Dataset Splits	Yes	We separate our data into training/validation/test sets using a 60/20/20% split.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions 'Py Torch 1' and 'R package' for MTLR, but it does not provide specific version numbers for these software components. For example, 'Py Torch 1' is not a precise version like '1.9' or '1.10'.
Experiment Setup	Yes	Across experiments, we use the same DRSA architecture: a one-layer LSTM with hidden size 100 and a single feed-forward layer with a sigmoid activation on the output for each time-step (Ren et al. 2019). We separate our data into training/validation/test sets using a 60/20/20% split. For training, we use Adam and a batch size of 50 (Kingma and Ba 2015). We train for 100 epochs (which, empirically, was enough for models to converge) and select the best model based on a validation set.