reproducibilityindex.ai

Differentiable sorting for censored time-to-event data.

Authors: Andre Vauvelle, Benjamin Wild, Roland Eils, Spiros Denaxas

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments reveal that Diffsurv outperforms established baselines in various simulated and real-world risk prediction scenarios. (Lines 13-14) Furthermore, we demonstrate the algorithmic advantages of Diffsurv by presenting a novel method for top-k risk prediction that surpasses current methods. (Lines 14-16) In our experiments, we aim to assess the performance of Diffsurv and compare it against the conventional Cox Partial Likelihood (CPL) methods. (Lines 245-246)
Researcher Affiliation	Collaboration	Andre Vauvelle12 , Benjamin Wild3 , Roland Eils3, Spiros Denaxas1 University College London1, Benevolent AI2, Berlin Institute of Health3 {andre.vauvelle.19,s.denaxas}@ucl.ac.uk {benjamin.wild, roland.eils}@bih-charite.de
Pseudocode	No	The paper describes algorithms and methods in text and uses a diagram (Figure 1) to illustrate the process, but it does not contain structured pseudocode or algorithm blocks with numbered steps.
Open Source Code	Yes	Further details on the experimental setup, including compute time, are provided in Appendix B and at https://github.com/andre-vauvelle/diffsurv. (Lines 267-269)
Open Datasets	Yes	Semi-synthetic surv SVHN: Based on the Street View House Numbers (SVHN) dataset [Netzer et al., 2011] (Lines 269-270) We assess our methods on several public datasets: Four small, popular real-world survival datasets (FLCHAIN, NWTCO, SUPPORT, METABRIC) [Kvamme et al., 2019] and the MIMIC IV Chest X-Ray dataset (CXR) with death as the event [Johnson et al., 2019]. (Lines 294-297)
Dataset Splits	Yes	Validation approach varies: for smaller datasets, we apply nested 5-fold cross-validation, while for imaging datasets we use train:val:test splits. (Lines 261-262) For surv SVHN the train:val:test split is provided by Netzer et al. [2011] as is 230,755:5,000:13,068. (Lines 493-494) Finally, the train:val:test split of 8:1:1 is done at the patient level ensuring no images from a patient in the test set was found in the training data. (Lines 486-487)
Hardware Specification	Yes	In the most demanding case, the MIMIC IV CXR experiments, run on an 11GB NVIDIA Ge Force GTX 1080 Ti, took roughly 18.5 hours per experiment. (Lines 520-521)
Software Dependencies	No	All neural network baselines were implemented using Py Torch and Py Torch Lightning. (Lines 523-524) The paper mentions the software used but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	We performed hyperparameter tuning for learning rate, weight decay, batch size, and risk set size. (Lines 263-264) Table 4: Hyperparameter values for small real-world datasets. Hyperparameter Values Learning rate [0.1, 0.01, 0.001, 1e-4] Weight decay [0.1, 0.01, 0.001, 1e-4, 1e-5, 0] (Batch size, risk set size) [(32, 8), (16, 16), (8, 32), (4, 64), (1, 256)] (Lines 498-499) For imaging datasets, we fix learning rate and weight decay for both CPL and Diffsurv. For both surv SVHN and MIMIC IV CXR, we use a fixed learning rate of 10-4 and weight decay of 10-5. We also used early stopping with a patience of 20 epochs and a maximum of 100,000 training steps. (Lines 500-502)