Proper Scoring Rules for Survival Analysis

Authors: Hiroki Yanagisawa

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we compared practical prediction performances of various loss functions on real datasets. We used three datasets for the survival analysis from the packages in R (R Core Team, 2016): the flchain dataset (Dispenzieri et al., 2012), which was obtained from the survival package and contains 7874 data points (69.9% of which are censored), the prostate Survival dataset (Lu-Yao et al., 2009), which was obtained from the asaur package and contains 14294 data points (71.7% of which are censored), and the support dataset (Knaus et al., 1995), which was obtained from the casebase package and contains 9104 data points (31.9% of which are censored).
Researcher Affiliation Industry 1IBM Research Tokyo, Tokyo, Japan. Correspondence to: Hiroki Yanagisawa <yanagis@jp.ibm.com>.
Pseudocode No The paper describes algorithms (e.g., grid search algorithm, iterative reweighting algorithm) but does not present them in a structured pseudocode block or an algorithm figure.
Open Source Code Yes Our implementation of the scoring rules are available at https://github.com/IBM/dqs.
Open Datasets Yes We used three datasets for the survival analysis from the packages in R (R Core Team, 2016): the flchain dataset (Dispenzieri et al., 2012), which was obtained from the survival package and contains 7874 data points (69.9% of which are censored), the prostate Survival dataset (Lu-Yao et al., 2009), which was obtained from the asaur package and contains 14294 data points (71.7% of which are censored), and the support dataset (Knaus et al., 1995), which was obtained from the casebase package and contains 9104 data points (31.9% of which are censored).
Dataset Splits Yes In these experiments, we split the data points into training (60%), validation (20%), and test (20%), and each bar shows the mean of the measurements on the test data of five random splits together with the error bar, which represents the standard deviation.
Hardware Specification Yes All our experiments were conducted on a virtual machine with an Intel Xeon CPU (3.30 GHz) processor without any GPU and 64 GB of memory running Red Hat Enterprise Linux Server 7.6.
Software Dependencies Yes We used Python 3.7.4 and Py Torch 1.7.1 for the implementation.
Experiment Setup Yes For the training of the neural network, we used the Adam optimizer (Kingma & Ba, 2015) with the learning rate 0.001, and the other parameters were set to their default values. We ran training for 300 epochs for our neural network models. Our implementation of the scoring rules are available at https://github.com/IBM/dqs.