reproducibilityindex.ai

Proper Scoring Rules for Survival Analysis

Authors: Hiroki Yanagisawa

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we compared practical prediction performances of various loss functions on real datasets. We used three datasets for the survival analysis from the packages in R (R Core Team, 2016): the ﬂchain dataset (Dispenzieri et al., 2012), which was obtained from the survival package and contains 7874 data points (69.9% of which are censored), the prostate Survival dataset (Lu-Yao et al., 2009), which was obtained from the asaur package and contains 14294 data points (71.7% of which are censored), and the support dataset (Knaus et al., 1995), which was obtained from the casebase package and contains 9104 data points (31.9% of which are censored).
Researcher Affiliation	Industry	1IBM Research Tokyo, Tokyo, Japan. Correspondence to: Hiroki Yanagisawa <yanagis@jp.ibm.com>.
Pseudocode	No	The paper describes algorithms (e.g., grid search algorithm, iterative reweighting algorithm) but does not present them in a structured pseudocode block or an algorithm figure.
Open Source Code	Yes	Our implementation of the scoring rules are available at https://github.com/IBM/dqs.
Open Datasets	Yes	We used three datasets for the survival analysis from the packages in R (R Core Team, 2016): the ﬂchain dataset (Dispenzieri et al., 2012), which was obtained from the survival package and contains 7874 data points (69.9% of which are censored), the prostate Survival dataset (Lu-Yao et al., 2009), which was obtained from the asaur package and contains 14294 data points (71.7% of which are censored), and the support dataset (Knaus et al., 1995), which was obtained from the casebase package and contains 9104 data points (31.9% of which are censored).
Dataset Splits	Yes	In these experiments, we split the data points into training (60%), validation (20%), and test (20%), and each bar shows the mean of the measurements on the test data of ﬁve random splits together with the error bar, which represents the standard deviation.
Hardware Specification	Yes	All our experiments were conducted on a virtual machine with an Intel Xeon CPU (3.30 GHz) processor without any GPU and 64 GB of memory running Red Hat Enterprise Linux Server 7.6.
Software Dependencies	Yes	We used Python 3.7.4 and Py Torch 1.7.1 for the implementation.
Experiment Setup	Yes	For the training of the neural network, we used the Adam optimizer (Kingma & Ba, 2015) with the learning rate 0.001, and the other parameters were set to their default values. We ran training for 300 epochs for our neural network models. Our implementation of the scoring rules are available at https://github.com/IBM/dqs.