X-CAL: Explicit Calibration for Survival Analysis

Authors: Mark Goldstein, Xintian Han, Aahlad Puli, Adler Perotte, Rajesh Ranganath

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we fit a variety of shallow and deep models on simulated data, a survival dataset based on MNIST, on lengthof-stay prediction using MIMIC-III data, and on brain cancer data from The Cancer Genome Atlas. We show that the models we study can be miscalibrated. We give experimental evidence on these datasets that X-CAL improves D-CALIBRATION without a large decrease in concordance or likelihood.
Researcher Affiliation Academia Mark Goldstein New York University goldstein@nyu.edu Xintian Han New York University xintian.han@nyu.edu Aahlad Puli New York University aahlad@nyu.edu Adler J. Perotte Columbia University adler.perotte@columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is available at https://github.com/rajesh-lab/X-CAL
Open Datasets Yes We simulate a survival dataset conditionally on the MNIST dataset [Le Cun et al., 2010]. predict the length of stay (in number of hours) in the ICU, using data from the MIMIC-III dataset [Johnson et al., 2016], the glioma (a type of brain cancer) dataset 4 collected as part of the TCGA program and studied in [Network, 2015]. 4https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studiedcancers/glioma
Dataset Splits Yes We sample a train/validation/test sets with 100k/50k/50k datapoints, respectively. We use Py Torch s MNIST with test split into validation/test. There are 2, 925, 434 and 525, 912 instances in the training and test sets. We split the training set in half for train and validation. The train/validation/test sets are made of 552/276/277 datapoints respectively
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper mentions 'Py Torch s MNIST' and 'Lifelines package' but does not specify version numbers for these software components.
Experiment Setup No The paper states 'We use γ = 10000.' and 'We use 20 D-CALIBRATION bins disjoint over [0, 1] for all experiments except for the cancer data, where we use 10 bins as in Haider et al. [2020].' and 'All reported results are an average of three seeds.' but does not include other specific hyperparameters like learning rates, batch sizes, or optimizer details.