Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

NeuralSurv: Deep Survival Analysis with Bayesian Uncertainty Quantification

Authors: Mélodie Monod, Alessandro Micheli, Samir Bhatt

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on synthetic and real survival datasets, in data-scarce settings, Neural Surv consistently delivers superior calibration compared to state-of-the-art deep survival models, and matches or exceeds their discriminative performance.
Researcher Affiliation	Academia	Mélodie Monod Imperial College London London, United Kingdom EMAIL Alessandro Micheli Imperial College London London, United Kingdom EMAIL Samir Bhatt Imperial College London; University of Copenhagen London, United Kingdom; Copenhagen, Denmark EMAIL
Pseudocode	Yes	We provide an algorithmic description of our EM algorithm in Algorithm 1.
Open Source Code	Yes	The code to reproduce our experiments is available on the Git Hub repository https://github.com/MLGlobal Health/neuralsurv under the MIT License.
Open Datasets	Yes	The code to reproduce our experiments is available on the Git Hub repository https://github.com/MLGlobal Health/neuralsurv under the MIT License. [...] The real-world data used in our experiments are publicly available open-source datasets. The source of these datasets, including the specific package and version used to obtain them, is detailed in Section J.1.
Dataset Splits	Yes	The data is randomly partitioned into five equally sized folds, with each fold serving as a distinct train/test split, comprising 100 training samples and 25 test samples per fold. From the training set, 20% (central experiment: 20 samples, ablation experiment: 40 samples) was further attributed to the validation set.
Hardware Specification	Yes	Machine. The experiments were conducted on NVIDIA RTX A6000 GPUs with 48GB of memory.
Software Dependencies	Yes	The C-index and the IPCW IBS metrics are computed using the Torch Surv package [37]. Torch Surv: A Lightweight Package for Deep Survival Analysis. Journal of Open Source Software, 9(104):7341, 2024. (version 0.1.4). [...] Håvard Kvamme. pycox: Survival analysis with PyTorch. https://pypi.org/project/ pycox/, 2024. (version 0.3.0).
Experiment Setup	Yes	All deep learning methods share the same neural network architecture, which is detailed in Section K. The benchmark deep survival models were trained using the Adam optimizer with a learning rate selected via grid search. Batch normalization was applied, and a dropout rate of 0.1 was used. Training was conducted for 1,000 epochs with a batch size of 256.