Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Estimating Calibrated Individualized Survival Curves with Deep Learning
Authors: Fahad Kamran, Jenna Wiens240-248
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared to state-of-the-art approaches across two publicly available datasets, our proposed training scheme leads to significant improvements in calibration, while maintaining good discriminative performance. |
| Researcher Affiliation | Academia | Fahad Kamran, Jenna Wiens Computer Science and Engineering University of Michigan, Ann Arbor, MI fhdkmrn, EMAIL |
| Pseudocode | No | The paper describes methodologies mathematically and verbally but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All deep models were built in Py Torch 1, while MTLR was implemented using the corresponding R package (Paszke et al. 2019; Haider 2019). [Footnote 1 points to https://github.com/MLD3/Calibrated-Survival-Analysis] |
| Open Datasets | Yes | We consider two publicly available datasets: the Northern Alberta Cancer Dataset (NACD) consists of 2,402 individuals with various forms of cancer (Haider et al. 2020; Yu et al. 2011). [...] CLINIC records the survival status of 6,036 patients in a hospital, with 13.2% being censored (Knaus et al. 1995). |
| Dataset Splits | Yes | We separate our data into training/validation/test sets using a 60/20/20% split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch 1' and 'R package' for MTLR, but it does not provide specific version numbers for these software components. For example, 'Py Torch 1' is not a precise version like '1.9' or '1.10'. |
| Experiment Setup | Yes | Across experiments, we use the same DRSA architecture: a one-layer LSTM with hidden size 100 and a single feed-forward layer with a sigmoid activation on the output for each time-step (Ren et al. 2019). We separate our data into training/validation/test sets using a 60/20/20% split. For training, we use Adam and a batch size of 50 (Kingma and Ba 2015). We train for 100 epochs (which, empirically, was enough for models to converge) and select the best model based on a validation set. |