Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DoseSurv: Predicting Personalized Survival Outcomes under Continuous-Valued Treatments

Authors: Moritz Gögl, Yu Liu, Christopher Yau, Peter Watkinson, Tingting Zhu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experiments across various treatment scenarios on both simulated and real-world data, demonstrating Dose Surv s superior performance over existing baseline models. ... 5 Experiments 5.1 Experimental Setup 5.2 Results and Discussion
Researcher Affiliation	Academia	Moritz Gögl1 Yu Liu1 Christopher Yau1,2 Peter Watkinson1 Tingting Zhu1 1University of Oxford 2Health Data Research UK EMAIL
Pseudocode	No	The paper describes the methodology using text and mathematical equations in Sections 3 and 4, and refers to Fig. 1 for the model architecture, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The source code of Dose Surv will be made available at: https://github.com/mgoegl/Dose Surv.
Open Datasets	Yes	Additionally, we evaluate Dose Surv on the Twins dataset, a well-established benchmark in the HTE literature for binary treatments. This dataset provides survival times for twins born in the United States between 1989 and 1991 [3]. ... We use data from 1,545 breast cancer patients in the Rotterdam Tumor Bank [22]. ... For testing, we use data from 686 patients enrolled in a randomized controlled trial conducted by the German Breast Cancer Study Group (GBSG) [59]
Dataset Splits	Yes	Across all scenarios, we employ 17,000 samples for training, 1,000 samples for validation, and 2,000 samples for testing. ... The final cohort contains 5,601 samples. For our analysis, we use 70% of samples for training. From the remaining samples, we extract 30% for validation and 70% for testing. ... this observational dataset, we use these data for model training (85% training, 15% validation).
Hardware Specification	Yes	Experiments were performed on a single GPU machine (specifications: CPU Intel Xeon W5-2445, GPU NVIDIA RTX A4000, and RAM 64GB DDR5, OS Linux).
Software Dependencies	No	Dose Surv is implemented in Py Torch. ... Network parameters were optimized using Adam optimizer [38]... We benchmarked Dose Surv against publicly available implementations of four state-of-the-art machine and deep learning models for survival analysis, Deep Surv8, Deep Hit8, RSF9 and NSC10. ... scikit-survival [54]
Experiment Setup	Yes	Across all main experiments, we implemented Dose Surv with 1 hidden layer in the representation network and 3 hidden layers in each treatment-specific head of the RBF hazard estimator, each layer comprising 100 nodes. For each layer in the RBF hazard estimator, we use 5 Gaussian RBFs with centers, initialized at {0, 0.25, 0.5, 0.75, 1}. The shape parameters σ(a) l,k are initialized at 0.7. For each experiment, we chose γ from {0.1, 0.01}... Network parameters were optimized using Adam optimizer [38] with learning rate of 0.001. We use Re LU activation functions, dropout probability of 0.1, and a batch size of 500. Moreover, we implement Dose Surv with batch normalization layers, and employ early stopping after 30 epochs without model improvement on the validation data. ... we choose Q = 3 and T = 5.