Proper Scoring Rules for Survival Analysis
Authors: Hiroki Yanagisawa
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we compared practical prediction performances of various loss functions on real datasets. We used three datasets for the survival analysis from the packages in R (R Core Team, 2016): the flchain dataset (Dispenzieri et al., 2012), which was obtained from the survival package and contains 7874 data points (69.9% of which are censored), the prostate Survival dataset (Lu-Yao et al., 2009), which was obtained from the asaur package and contains 14294 data points (71.7% of which are censored), and the support dataset (Knaus et al., 1995), which was obtained from the casebase package and contains 9104 data points (31.9% of which are censored). |
| Researcher Affiliation | Industry | 1IBM Research Tokyo, Tokyo, Japan. Correspondence to: Hiroki Yanagisawa <yanagis@jp.ibm.com>. |
| Pseudocode | No | The paper describes algorithms (e.g., grid search algorithm, iterative reweighting algorithm) but does not present them in a structured pseudocode block or an algorithm figure. |
| Open Source Code | Yes | Our implementation of the scoring rules are available at https://github.com/IBM/dqs. |
| Open Datasets | Yes | We used three datasets for the survival analysis from the packages in R (R Core Team, 2016): the flchain dataset (Dispenzieri et al., 2012), which was obtained from the survival package and contains 7874 data points (69.9% of which are censored), the prostate Survival dataset (Lu-Yao et al., 2009), which was obtained from the asaur package and contains 14294 data points (71.7% of which are censored), and the support dataset (Knaus et al., 1995), which was obtained from the casebase package and contains 9104 data points (31.9% of which are censored). |
| Dataset Splits | Yes | In these experiments, we split the data points into training (60%), validation (20%), and test (20%), and each bar shows the mean of the measurements on the test data of five random splits together with the error bar, which represents the standard deviation. |
| Hardware Specification | Yes | All our experiments were conducted on a virtual machine with an Intel Xeon CPU (3.30 GHz) processor without any GPU and 64 GB of memory running Red Hat Enterprise Linux Server 7.6. |
| Software Dependencies | Yes | We used Python 3.7.4 and Py Torch 1.7.1 for the implementation. |
| Experiment Setup | Yes | For the training of the neural network, we used the Adam optimizer (Kingma & Ba, 2015) with the learning rate 0.001, and the other parameters were set to their default values. We ran training for 300 epochs for our neural network models. Our implementation of the scoring rules are available at https://github.com/IBM/dqs. |